Uni-Logo

WebcrawlerJS

Link to the WebcrawlerJS Webinterface (firefox is required) →

Program description

The current search engines have no posibilities to make queries about the structure of JavaScript Code, but provide only string search. The WebcrawlerJS is a command line program written in Java which downloads JavaScript files from the internet with the help of Yahoo and Google CodeSearch and converts these with the help of Mozillas JavaScript Parser Rhino into an XML representation. If addresses to new files are found by the WebcrawlerJS, it tries to download the corresponding files and to add them to the current data pool. There with no duplicates independent of which arguments (and searching services) the WebcrawlerJS is executed with and how often. Specific features of the respective search engines remain clear before the user. On the resultant XML files the user can access with a file manager or about the Berkeley XML data base manager DBXML of Oracle. For the access about a web browser the WebcrawlerJS creates a web interface automaticly which links the XML files with its corresponding JavaScript code and makes statistical overviews of the data pool. This webinterface needs no further configuration and is ready to use with every web server with PHP 5 installation (and needs no databases). With the web interface the user can execute XPATH queries with the help of PHPs SimpleXML on single files, as well as on the whole data pool. There with a public web interface for the current data pool of 2455 JavaScript (38955 lines of code). The web interface needs the firefox webbrowser.



The webinterface of the WebcrawlerJS.

Screenshot of the webinterface.

DescriptionSize/TypeLink
Program description (German)102 KB,PDFhelp-doku.pdf
Webcrawler-Starterkit3 MB, ZIP-ArchiveWebcrawler-Starterkit-1.0.zip
Webcrawler-Starterkit 23 MB, ZIP-ArchiveWebcrawler-Starterkit_2.0.zip