<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mundo Ruby &#187; regexp</title>
	<atom:link href="http://www.mundoruby.com.ar/tag/regexp/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mundoruby.com.ar</link>
	<description>Ruby Artists, Hackers y otras yerbas ...</description>
	<lastBuildDate>Wed, 12 Aug 2009 23:02:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Pequeñas delicias de las expresiones regulares</title>
		<link>http://www.mundoruby.com.ar/2009/06/13/pequenas-delicias-de-las-expresiones-regulares/</link>
		<comments>http://www.mundoruby.com.ar/2009/06/13/pequenas-delicias-de-las-expresiones-regulares/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 19:57:28 +0000</pubDate>
		<dc:creator>FreedomCoder</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[how-to]]></category>
		<category><![CDATA[regexp]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mundoruby.com.ar/?p=113</guid>
		<description><![CDATA[Como les conté acá y acá, estoy escribiendo un tokenizador para un wiki que estoy programando. Y hoy me encontré con una cosa muy extraña de las expresiones regulares.
En ruby la función match sirve para buscar el primer match de una regex dentro de un string. Por ejemplo (usando el irb):

irb(main):001:0&#62; m = /a/.match "babab"
=&#62; #&#60;MatchData "a"&#62;
irb(main):002:0&#62; m.pre_match
=&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>Como les conté <a href="http://aurelianito.blogspot.com/2009/06/tokenizer-de-rapidito.html">acá</a> y <a href="http://aurelianito.blogspot.com/2009/06/tokenizer-de-rapidito-segunda-version.html">acá</a>, estoy escribiendo un tokenizador para un wiki que estoy programando. Y hoy me encontré con una cosa muy extraña de las expresiones regulares.<br />
En ruby la función <code>match</code> sirve para buscar el primer match de una regex dentro de un string. Por ejemplo (usando el irb):</p>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">irb(main):001:0&gt; m = /a/.match "babab"
=&gt; #&lt;MatchData "a"&gt;
irb(main):002:0&gt; m.pre_match
=&gt; "b"
irb(main):003:0&gt; m[0]
=&gt; "a"</pre>
</pre>
<p>En particular, el <code>pre_match</code> es lo que está antes del match en el string. También según había entendido (mal) <code>/\Z/</code> matchea con el final del string. Por ejemplo:</p>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">irb(main):004:0&gt; m = /\Z/.match "hola"
=&gt; #&lt;MatchData ""&gt;
irb(main):005:0&gt; m.pre_match
=&gt; "hola"</pre>
</pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">Pero, <code>/\Z/</code> tiene un comportamiento muy extraño, aunque <a href="http://www.regular-expressions.info/reference.html">documentado</a>, cuando el último caracter antes del final es un <code>\n</code>. Lo que pasa es que el <code>pre_match</code> queda ¡sin el<code>\n</code> del final!. Lo muestro en el irb:</span></pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">irb(main):006:0&gt; m = /\Z/.match "\n"
=&gt; #&lt;MatchData ""&gt;
irb(main):007:0&gt; m.pre_match
=&gt; ""</pre>
<p><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">Para que no se manduque el <code>\n</code>, hay que usar <code>/\z/</code> (¡en minúscula!):</span></pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">irb(main):008:0&gt; m = /\z/.match "\n"
=&gt; #&lt;MatchData ""&gt;
irb(main):009:0&gt; m.pre_match
=&gt; "\n"</pre>
<p><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">Por lo tanto tuve que tocar el tokenizer, ahora la función de initialize quedó así (miren el cambio de la "Z" a "z"):</span></pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">    <span style="color: #aa0000; font-weight: bold;">def </span><span style="color: #007777;">initialize</span><span style="color: #444477; font-weight: bold;">(</span> <span style="color: #000044;">delimiters</span> <span style="color: #444477; font-weight: bold;">)</span>
      <span style="color: #337777;">@delimiter_list</span> <span style="color: #444477; font-weight: bold;">=</span> <span style="color: #444477; font-weight: bold;">[[/</span><span style="color: #bb6666;"><span>\z</span></span><span style="color: #444477; font-weight: bold;">/,</span> <span style="color: #009999;">:finish</span><span style="color: #444477; font-weight: bold;">]]</span> <span style="color: #444477; font-weight: bold;">+</span>
        <span style="color: #000044;">delimiters</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">to_a</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">map</span> <span style="color: #444477; font-weight: bold;">{</span> <span style="color: #444477; font-weight: bold;">|</span><span style="color: #000044;">k</span><span style="color: #444477; font-weight: bold;">,</span><span style="color: #000044;">arr</span><span style="color: #444477; font-weight: bold;">|</span> <span style="color: #000044;">arr</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">map</span> <span style="color: #444477; font-weight: bold;">{</span> <span style="color: #444477; font-weight: bold;">|</span><span style="color: #000044;">re</span><span style="color: #444477; font-weight: bold;">|</span> <span style="color: #444477; font-weight: bold;">[</span><span style="color: #000044;">re</span><span style="color: #444477; font-weight: bold;">,</span> <span style="color: #000044;">k</span><span style="color: #444477; font-weight: bold;">]</span> <span style="color: #444477; font-weight: bold;">}</span> <span style="color: #444477; font-weight: bold;">}.</span><span style="color: #000044;">inject</span><span style="color: #444477; font-weight: bold;">([])</span> <span style="color: #444477; font-weight: bold;">{</span> <span style="color: #444477; font-weight: bold;">|</span><span style="color: #000044;">ac</span><span style="color: #444477; font-weight: bold;">,</span><span style="color: #000044;">ps</span><span style="color: #444477; font-weight: bold;">|</span> <span style="color: #000044;">ac</span> <span style="color: #444477; font-weight: bold;">+</span> <span style="color: #000044;">ps</span> <span style="color: #444477; font-weight: bold;">}</span>
      <span style="color: #337777;">@match_cache</span> <span style="color: #444477; font-weight: bold;">=</span> <span style="color: #0077ff;">nil</span>
    <span style="color: #aa0000; font-weight: bold;">end</span></pre>
<p><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">Y el test que captura el problema que genera usar \Z en vez de \z quedó así:</span></pre>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace; font-size: 1.1em; background-color: #ffffcc; color: #000000; overflow-x: auto; overflow-y: auto; margin-top: 4px; margin-right: 0px; margin-bottom: 4px; margin-left: 0px; width: 631px; padding: 10px;">  <span style="color: #aa0000; font-weight: bold;">def </span><span style="color: #007777;">test_carriage_return_ending</span>
    <span style="color: #000044;">tok</span> <span style="color: #444477; font-weight: bold;">=</span> <span style="color: #0077ff;">Tokenizer</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">new</span><span style="color: #444477; font-weight: bold;">(</span> <span style="color: #009999;">:a_kind</span> <span style="color: #444477; font-weight: bold;">=&gt;</span> <span style="color: #444477; font-weight: bold;">[/</span><span style="color: #bb6666;">!</span><span style="color: #444477; font-weight: bold;">/]</span> <span style="color: #444477; font-weight: bold;">)</span>
    <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">source</span> <span style="color: #444477; font-weight: bold;">=</span> <span style="color: #444477; font-weight: bold;">"</span><span style="color: #994444;">bang!<span>\n</span></span><span style="color: #444477; font-weight: bold;">"</span>
    <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">next_token</span>
    <span style="color: #000044;">assert_equal</span> <span style="color: #0077ff;">true</span><span style="color: #444477; font-weight: bold;">,</span> <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">has_next?</span>
    <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">next_token</span>
    <span style="color: #000044;">assert_equal</span> <span style="color: #0077ff;">true</span><span style="color: #444477; font-weight: bold;">,</span> <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">has_next?</span>
    <span style="color: #000044;">assert_equal</span> <span style="color: #444477; font-weight: bold;">"</span><span style="color: #994444;"><span>\n</span></span><span style="color: #444477; font-weight: bold;">",</span> <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">next_token</span><span style="color: #444477; font-weight: bold;">[</span><span style="color: #dd5555;">0</span><span style="color: #444477; font-weight: bold;">].</span><span style="color: #000044;">to_s</span>
    <span style="color: #000044;">assert_equal</span> <span style="color: #0077ff;">false</span><span style="color: #444477; font-weight: bold;">,</span> <span style="color: #000044;">tok</span><span style="color: #444477; font-weight: bold;">.</span><span style="color: #000044;">has_next?</span>
  <span style="color: #aa0000; font-weight: bold;">end</span></pre>
<p><span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; line-height: 19px; white-space: normal; font-size: 13px;">Happy hacking,</span></pre>
<p>Aureliano.</p>
<div><img style="border: 0px initial initial;" src="https://blogger.googleusercontent.com/tracker/1437970354124720603-4301954417416078176?l=aurelianito.blogspot.com" alt="" width="1" height="1" /></div>
<p>(Via <a href="http://aurelianito.blogspot.com/">aurelianito</a>.) Original Link: <a href="http://aurelianito.blogspot.com/2009/06/pequenas-delicias-de-las-expresiones.html">Pequeñas delicias de las expresiones regulares</a></p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-7949681675937032";
google_ad_slot = "0874687580";
google_ad_width = 468;
google_ad_height = 60;
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mundoruby.com.ar/2009/06/13/pequenas-delicias-de-las-expresiones-regulares/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

