scrapy response.xpath returns empty array on xml document with default namespace, while response.re works scrapy response.xpath returns empty array on xml document with default namespace, while response.re works xml xml

scrapy response.xpath returns empty array on xml document with default namespace, while response.re works


The problem was due to the default namespace declared at the root element of the XML :

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

So in that XML, the root element and its descendants without prefix inherits the same namespace, implicitly.

On the other hand, in XPath, you need to use prefix that bound to a namespace URI to reference element in that namespace, there is no such default namespace implied.

You can use selector.register_namespace() to bind a namespace prefix to the default namespace URI, and then use the prefix in your XPath :

response.selector.register_namespace('d', 'http://www.sitemaps.org/schemas/sitemap/0.9')response.xpath('//d:loc')


You can also use xpath with local namespace such as in:

response.xpath("//*[local-name()='loc']")

This is especially useful if you are parsing responses from multiple heterogeneous sources and you don't want to register each and every namespace.