Foros del Web - Ver Mensaje Individual

razpeitia · #3 (**permalink**) 01/09/2011, 09:28

No uses expresiones regulares para parsear HTML. Mejor usa un parser HTML como lxml.

Código Python:

Ver originalfrom lxml import etree
 
s = u"""<a href="index.php?id=1111">23/08/1980&nbsp;</a>
<a href="index.php?id=1111">CARLOS RIQUELME &nbsp;</a>
 
<a href="index.php?id=1112">20-04-1983</a>
<a href="index.php?id=1112">Luis Sobarso</a>
 
<a href="index.php?id=1113">11/03</a>
<a href="index.php?id=1113">
                
                    Ana Lopez
                
            
</a>"""
 
root = etree.HTML(s)
for i in root.xpath("//a"):
    print i.get('href'), i.text