scrapy response.xpath returns empty array on xml document with default namespace, while response.re works
The problem was due to the default namespace declared at the root element of the XML :
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
So in that XML, the root element and its descendants without prefix inherits the same namespace, implicitly.
On the other hand, in XPath, you need to use prefix that bound to a namespace URI to reference element in that namespace, there is no such default namespace implied.
You can use selector.register_namespace()
to bind a namespace prefix to the default namespace URI, and then use the prefix in your XPath :
response.selector.register_namespace('d', 'http://www.sitemaps.org/schemas/sitemap/0.9')response.xpath('//d:loc')
You can also use xpath with local namespace such as in:
response.xpath("//*[local-name()='loc']")
This is especially useful if you are parsing responses from multiple heterogeneous sources and you don't want to register each and every namespace.