Skip to content Skip to sidebar Skip to footer

Scrapy Xpath Not Extracting Div Containing Special Characters <%=

I am new to Scrapy. I am trying to extract the h2 text from the following URL: 'https://www.tysonprop.co.za/agents/' I have 2 problems: My xpath can get to the script element, but

Solution 1:

Does branch.branch_name looks like a address in JSON format? is there a call which loads data you are looking for ? maybe, let's see

By looking through your browser developer tool you can find requests in network tab and by searching between them you will face this AJAX call which loads exactly the data you are looking for. so:

import json
import scrapy
classTysonSpider(scrapy.Spider):
    name = 'tyson_spider'
defstart_requests(self):
        url = 'https://www.tysonprop.co.za/ajax/agents/?branch_id=25'yield scrapy.Request(url=url, callback=self.parse)
defparse(self, response):
        json_data = json.loads(response.text)
        branch_name = json_data['branch']['branch_name']
        yield {'branchName': branch_name}

Solution 2:

The div inside script tag it is a text. To get it as html, you can do following:

from scrapy.selector import Selector

....
defparse(self, response):

        script = Selector(text=response.xpath('//script[@id="id_branch_template"]/text()').get())
        div = script.xpath('./div[contains(@class,"branch-container")]')
        h2 = div.xpath('.//h2[contains(@class,"branch-name")]/text()').extract()
        yield {'branchName': h2}

But please NOTE, the h2 doesn't contain any text, so you result will be an empty array

Post a Comment for "Scrapy Xpath Not Extracting Div Containing Special Characters <%="