Extraer links de un documento HTML

SeNNiNG · #1 (**permalink**) 05/02/2010, 05:04

Buenas
Necesito extraer todos los enlaces de un documento html, para ello utilizo la siguiente función:

Código:

public static LinkedList getLinks(String texto) {
           LinkedList result = new LinkedList();
           try
           {
               HTMLEditorKit kit = new HTMLEditorKit();
               HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
               doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
               StringReader sr = new StringReader(texto);
               kit.read(sr, doc, 0);

               HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
               
               while (it.isValid())
               {                   
                    SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();                    
                    String link = (String)s.getAttribute(HTML.Attribute.HREF);
                    if (link != null) {
                        // Agregamos el resultado a la lista
                        if(link.indexOf("localhost")<=0) {
                            result.add(link);
                        }//if
                    }//if
                    it.next();
                }//while
           }//try
           catch (Exception ex)
           {
               System.out.println(ex);
               return null;
           }//catch
           return result;
    }//getLinks

Pero cuando lo hago para <LINK /> en vez de <A></A> me tira esto: java.lang.ClassCastException: javax.swing.text.html.HTMLDocument$RunElement cannot be cast to javax.swing.text.SimpleAttributeSet

Gracias de antemano