Scrapy Xpath Not Extracting Div Containing Special Characters <%=

July 31, 2024 Post a Comment

I am new to Scrapy. I am trying to extract the h2 text from the following URL: 'https://www.tysonprop.co.za/agents/' I have 2 problems: My xpath can get to the script element, but

Solution 1:

Does branch.branch_name looks like a address in JSON format? is there a call which loads data you are looking for ? maybe, let's see

By looking through your browser developer tool you can find requests in network tab and by searching between them you will face this AJAX call which loads exactly the data you are looking for. so:

import json
import scrapy

classTysonSpider(scrapy.Spider):
    name = 'tyson_spider'

defstart_requests(self):
        url = 'https://www.tysonprop.co.za/ajax/agents/?branch_id=25'yield scrapy.Request(url=url, callback=self.parse)

defparse(self, response):
        json_data = json.loads(response.text)
        branch_name = json_data['branch']['branch_name']
        yield {'branchName': branch_name}

Solution 2:

The div inside script tag it is a text. To get it as html, you can do following:

from scrapy.selector import Selector

....
defparse(self, response):

        script = Selector(text=response.xpath('//script[@id="id_branch_template"]/text()').get())
        div = script.xpath('./div[contains(@class,"branch-container")]')
        h2 = div.xpath('.//h2[contains(@class,"branch-name")]/text()').extract()
        yield {'branchName': h2}

But please NOTE, the h2 doesn't contain any text, so you result will be an empty array

Html5 College

Scrapy Xpath Not Extracting Div Containing Special Characters <%=

Solution 1:

Solution 2:

Post a Comment for "Scrapy Xpath Not Extracting Div Containing Special Characters <%="