forked from reverbrain/swarm
Swarm is aiming at your web
License
kod3r/swarm
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Swarm is high-perfomance library for web crawling. Depends on libcurl, uriparser, libxml2. libcurl built with --enable-ares is highly needed for better perfomance. libcurl must be built with SSL support for accessing https sites. network_url is helper class for URI processing. It's not thread safe. Before using set_base must be called, true is returned if argument is valid url. network_url::normalized() returnes normalized form of url. network_url::host() returnes host part of URI. network_url::relative() returnes absolute URI generated from base and relative argument, if host is passed it's set to host part of new URI. url_finder is helper class for HTML parsing. It's not thread safe. Cosntructor takes the only argument - content of HTML document. url_finder::urls() parses the document and returnes list of all found urls from <a href="..."/> tags network_manager is helper class for url fetching. It's thread safe. Constructor takes the only argument - libev++ event loop reference. Object must be constructed in the same thread as event loop. network_manager::set_limit() set limit for number of concurrent web requests. network_manager::get() put network_request to processing queue. Callback will be called as result is ready.
About
Swarm is aiming at your web
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published