[Linux-disciples] Backslashes within quotes

Adam Kessel linux-disciples@bostoncoop.net
Tue, 25 Nov 2003 19:36:31 -0800


--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I don't think there is any way for a simple regexp to know whether a
quote is a closing quote or an opening quote or whatever--that's why for
programming we have such things as, e.g., open paren or open brace and
matching close paren and close brace.

I would suggest instead perhaps using single quotes within:

book=3D"My Life As a 'Monkey'"

And then your regexp could, if you like, turn those single quotes back
into double quotes when found within the double quotes.

A more complicated solution would be to count the number of quotes on a
given line and try to guess what is "open" and "close."

This would have a couple of problems: (1) if your text spanned more than
one line it would fail, and (2) more importantly, it couldn't handle:

I am "quoting this" and "quoting that."

How would it know which were the inner quotes and which were the outer? I
suppose it could look even further at the spaces preceeding and following
quotes to figure that out, but it just seems unnecessarily complex.

On Mon, Nov 24, 2003 at 04:47:58PM -0500, Stephen R Laniel wrote:
> I'm looking to write a Perl regex that would convert, e.g.,
> book=3D"(.*)" into <span class=3D"booktitle">$1</span> (with more
> generality, but that's the idea).
>=20
> The trouble is, what if $1 contains quotation marks? Let's imagine I had
> a book called "Screwed: Life Aboard "The Titanic"". How could I write a
> regex that would properly turn this into
> <span class=3D"booktitle">Screwed: Life Aboard "The Titanic"</span>?
>=20
> I could insist that my users escape each nested quote with a backslash, a
> la
> book=3D"My Life As a \"Monkee\""
>=20
> which seems like the best solution for now; embedded quotes are quite
> rare. (Of the 359 books that I own, I find only two that have quotes
> within the title. They're both by Feynman: _"Surely You're Joking, Mr.
> Feynman"_ and _"What Do You Care What Other People Think?"_)
>=20
> But I don't know how to write the regex to properly handle escaped
> quotation marks. Any ideas?
>=20
> Also, what if I have a movie title within a book title, so that I might
> want to write
>=20
> book=3D"movie=3D"Adaptation": The Screenplay"
>=20
> Users shouldn't be expected to escape the quotes around the movie title.
> A first pass over this string would turn movie=3D"Adaptation" into
> <span class=3D"movietitle">Adaptation</span>, which would then be a pair
> of unescaped quotation marks within the outer set.
>=20
> So. Hm. Any ideas how to handle edge cases like this?
--=20
Adam Kessel
http://bostoncoop.net/adam

--LQksG6bCIzRHxTLp
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD4DBQE/xB+/dTf3ZklQ6qYRApV3AKCEItOlQx9mJa+HwulLUpYfqJRyQQCWPCRQ
3OlBBTIAx9IXi8kpskWezw==
=frzI
-----END PGP SIGNATURE-----

--LQksG6bCIzRHxTLp--