How To Use Htmlagilitypack To Extract Html Data
I am learning to write web crawler and found some great examples to get me started but since I am new to this, I have a few questions in regards to the coding method. The search re
Solution 1:
This little snippet should get you started:
HtmlDocumentdoc=newHtmlDocument();
WebClientclient=newWebClient();
stringhtml= client.DownloadString("http://www.nysed.gov/coms/op001/opsc2a?profcd=67&plicno=001475&namechk=WIL");
doc.LoadHtml(html);
HtmlNodeCollectionnodes= doc.DocumentNode.SelectNodes("//div");
You basically use the WebClient
class to download the HTML file and then you load that HTML into the HtmlDocument
object. Then you need to use XPath to query the DOM tree and search for nodes. In the above example "nodes" will include all the div
elements in the document.
Here's a quick reference about the XPath syntax: http://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx
Post a Comment for "How To Use Htmlagilitypack To Extract Html Data"