Scrapy Xpath Not Extracting Div Containing Special Characters <%=
I am new to Scrapy. I am trying to extract the h2 text from the following URL: 'https://www.tysonprop.co.za/agents/' I have 2 problems: My xpath can get to the script element, but
Solution 1:
Does branch.branch_name
looks like a address in JSON format? is there a call which loads data you are looking for ? maybe, let's see
By looking through your browser developer tool you can find requests in network tab and by searching between them you will face this AJAX call which loads exactly the data you are looking for. so:
import json
import scrapy
classTysonSpider(scrapy.Spider):
name = 'tyson_spider'
defstart_requests(self):
url = 'https://www.tysonprop.co.za/ajax/agents/?branch_id=25'yield scrapy.Request(url=url, callback=self.parse)
defparse(self, response):
json_data = json.loads(response.text)
branch_name = json_data['branch']['branch_name']
yield {'branchName': branch_name}
Solution 2:
The div
inside script
tag it is a text.
To get it as html, you can do following:
from scrapy.selector import Selector
....
defparse(self, response):
script = Selector(text=response.xpath('//script[@id="id_branch_template"]/text()').get())
div = script.xpath('./div[contains(@class,"branch-container")]')
h2 = div.xpath('.//h2[contains(@class,"branch-name")]/text()').extract()
yield {'branchName': h2}
But please NOTE, the h2
doesn't contain any text, so you result will be an empty array
Post a Comment for "Scrapy Xpath Not Extracting Div Containing Special Characters <%="