Yo acabo de leer y ver lo que esta herramienta hace, y el vídeo, y me he quedado con la boca abierta. Hace practicamente una auditoría completa de la web. La voy a instalar en mi casa en cuanto llegue!!
http://www.screamingfrog.co.uk/seo-spider/
Un sumario de lo que hace (Copiado de la web, sorry porque está en inglés, si alguien lo necesita desesperadamente, puedo traducirlo)
Errors – Client & server errors (No responses, 4XX, 5XX)
Redirects – (3XX, permanent or temporary)
External Links – All followed links and their subsequent status codes
URI Issues – Non ASCII characters, underscores, uppercase characters, dynamic uris, long over 115 characters
Duplicate Pages – Hash value / MD5checksums lookup for pages with duplicate content
Page Title – Missing, duplicate, over 70 characters, same as h1, multiple
Meta Description – Missing, duplicate, over 156 characters, multiple
Meta Keywords – Mainly for reference as it’s only (barely) used by Yahoo.
H1 – Missing, duplicate, over 70 characters, multiple
H2 – Missing, duplicate, over 70 characters, multiple
Meta Robots – Index, noindex, follow, nofollow, noarchive, nosnippet, noodp, noydir etc
Meta Refresh – Including target page and time delay
Canonical link element & Canonical HTTP headers X-Robots-Tag
File Size
Page Depth Level
Inlinks – All pages linking to a URI
Outlinks – All pages a URI links out to
Anchor Text – All link text. Alt text from images with links
Follow & Nofollow – At link level (true/false)
Images – All URIs with the image link & all images from a given page. Images over 100kb, missing alt text, alt text over 100 characters
User-Agent Switcher – Crawl as Googlebot, Bingbot, or Yahoo! Slurp
Custom Source Code Search – The spider allows you to find anything you want in the source code of a website! Whether that’s analytics code, specific text, or code etc. (Please note – This is not a data extraction or scraping feature yet.)