Skip to content

ptrrkssn/phtx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Peter's HTML Table Data Extractor


LICENSE:

This program is free software; you can redistribute it and/or
modify it as you wish - as long as you don't claim that you wrote
it.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


DESCRIPTION:

This is a small tool that can be used to extract data from HTML tables in files.
It will strip the data from extra space and other HTML tags and output it as
CSV data on stdout.

If you find any bugs with the code, please feel free to send me patches at:

	Peter Eriksson <pen@lysator.liu.se>


KNOWN PROBLEMS / LIMITATIONS

The code assumes the HTML file uses ASCII or ISO 8859-1 (Latin-1) encoding (mostly
only an issue if the source uses HTML "entities" like &auml; and similar stuff).

Currently has a hardcoded limit of 256 tables per source file.


- Peter