Skip to content

kod3r/swarm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Swarm is high-perfomance library for web crawling.

Depends on libcurl, uriparser, libxml2.

libcurl built with --enable-ares is highly needed for better perfomance.
libcurl must be built with SSL support for accessing https sites.

network_url is helper class for URI processing. It's not thread safe.
Before using set_base must be called, true is returned if argument is valid url.
network_url::normalized() returnes normalized form of url.
network_url::host() returnes host part of URI.
network_url::relative() returnes absolute URI generated from base and relative
argument, if host is passed it's set to host part of new URI.


url_finder is helper class for HTML parsing. It's not thread safe.
Cosntructor takes the only argument - content of HTML document.
url_finder::urls() parses the document and returnes list of all found urls
from <a href="..."/> tags

network_manager is helper class for url fetching. It's thread safe.
Constructor takes the only argument - libev++ event loop reference. Object
must be constructed in the same thread as event loop.
network_manager::set_limit() set limit for number of concurrent web requests.
network_manager::get() put network_request to processing queue. Callback will be
called as result is ready.

About

Swarm is aiming at your web

Resources

License

Stars

Watchers

Forks

Packages

No packages published