Skip to content

jeffcrouse/craig2kml

Repository files navigation

craig2kml
By Jeff Crouse
jefftimesten at gmail dot com
http://www.jeffcrouse.info

This software is licensed under the GNU-GPL version 3 or later.

craig2kml is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 3 of the License, or (at your option) any later version.

craig2kml is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.


INTRODUCTION
----------------
This project started as Earthify (http://www.earthify.org), which kept breaking. I also heard recently that some websites that were scraping Craigslist were being blocked. So I decided to take the idea and turn it into a simple, easily reconfigurable c++ program that anyone could use on their own computer, or install on a server and make a Google Maps wrapper.


REQUIREMENTS
----------------
TidyLib		http://tidy.sourceforge.net/
libXML2		http://xmlsoft.org/
libKML  	http://code.google.com/p/libkml/
libCurl		http://curl.haxx.se/
PCRE		http://pcre.org/

EXAMPLE USAGE
---------------
craig2kml --url "http://newyork.craigslist.org/search/nfa?srchType=A&maxAsk=2000" -o max2000.kml

This will create the KML file "max2000.kml" and populate it with placemarks.  There are 2 folders: Mappable Listings and Unmappable Listings. The unmappable ones (those without a google maps link or fail geocoding) are placed in the middle of the mappable ones.

INSTALL
---------------
There is no configure script, so all you have to do is:

make


CONFIG FILE
---------------
This distribution comes with a sample configuration file called craig2kml.config.  Among other settings, this file contains XPATHs used to find the information on Craigslist that is needed to make the KML file.  In the case that Craigslsit changes its design, these will need to be changed.  


FAQ
-------------
1.  Why not just use the rss feeds?
The RSS feeds only include the most recent 25 entries.  By scraping the page, you can get all 100.  In future versions, it will follow the pagination also.

2. Is there a limit to how often I can use craig2kml?
Unless you are using caching, each listing that is parsed makes a call to the Google Geocoder API.  According to the "Usage Limits" section of the documentation (http://code.google.com/apis/maps/documentation/geocoding/#Limits), you can only perform 2,500 queries per day.  Presumably this is based on IP address.

If you are using caching, the calls to the Geocode API will be saved, so each listing will only use 1 call to the API the first time it is loaded.

3. The makefile isn't working.  What's up?
I use premake4 to generate makefiles.  You can find more information about it here http://industriousone.com/premake  This is not a requirement for building, but if the makefile is failing, you might want to take a look at the premake documentation.


TO DO
--------------
- Check to make sure that  we can create the file at the beginning.
- Better error checking
- use regular expression instead of xpath for google maps link?
- Use other contextual information to map non-mappable listings, such as title of the page and other geographic data
- Create option to follow pagination
- There are probably tons of memory leaks. I haven't done any leak testing.
- Make unmappable ones a different color
- resize images to max width
- Make cache files expire

About

c++ utility that takes a Craigslist search URL and turns listings into a KML file, which is loadable by many mapping programs including Google Earth and Google Maps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published