[Linux-disciples] Using a parser to delete the child

Stephen R Laniel steve at laniels.org
Thu Jun 15 10:12:36 EDT 2006


I'm using HTML::TreeBuilder [1] and HTML::Element [2] -- the
former of which inherits from the latter -- to parse some
HTML. I've used it so far to extract any elements which have
'class="contentCol"'. Now what I'd like to do is

1) delete the 'class="contentCol"' tag and any corresponding
   close tags
2) delete any child -- not any other descendants, just
   children -- of the 'class="contentCol"' elements, if and
   only if the 'class="contentCol"' is a table and the
   children are <td>, <tr>, etc.

I'm just getting started with recursive-descent parsers, so
I'm a little unsure how to encode my wishes. Can anyone out
there with more experience on such things help a guy 
out?

[1] - http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/TreeBuilder.pm
[2] - http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/Element.pm

-- 
Stephen R. Laniel
steve at laniels.org
Cell: +(617) 308-5571
http://laniels.org/
PGP key: http://laniels.org/slaniel.key
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.bostoncoop.net/pipermail/linux-disciples/attachments/20060615/e4147ecc/attachment.pgp


More information about the Linux-disciples mailing list