[Linux-disciples] Using a parser to delete the child
Stephen R Laniel
steve at laniels.org
Thu Jun 15 10:12:36 EDT 2006
I'm using HTML::TreeBuilder [1] and HTML::Element [2] -- the
former of which inherits from the latter -- to parse some
HTML. I've used it so far to extract any elements which have
'class="contentCol"'. Now what I'd like to do is
1) delete the 'class="contentCol"' tag and any corresponding
close tags
2) delete any child -- not any other descendants, just
children -- of the 'class="contentCol"' elements, if and
only if the 'class="contentCol"' is a table and the
children are <td>, <tr>, etc.
I'm just getting started with recursive-descent parsers, so
I'm a little unsure how to encode my wishes. Can anyone out
there with more experience on such things help a guy
out?
[1] - http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/TreeBuilder.pm
[2] - http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/Element.pm
--
Stephen R. Laniel
steve at laniels.org
Cell: +(617) 308-5571
http://laniels.org/
PGP key: http://laniels.org/slaniel.key
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.bostoncoop.net/pipermail/linux-disciples/attachments/20060615/e4147ecc/attachment.pgp
More information about the Linux-disciples
mailing list