Use "magic" HTML comments to protect regions of HTML from being modified by text processing tools.
htmlPreserve(x)
extractPreserveChunks(strval)
restorePreserveChunks(strval, chunks)
A character vector of HTML to be preserved.
Input string from which to extract/restore chunks.
The chunks
element of the return value of
extractPreserveChunks
.
htmlPreserve
returns a single-element character vector with
"magic" HTML comments surrounding the original text (unless the original
text was empty, in which case an empty string is returned).
extractPreserveChunks
returns a list with two named elements:
value
is the string with the regions replaced, and chunks
is
a named character vector where the names are the IDs and the values are the
regions that were extracted.
restorePreserveChunks
returns a character vector with the
chunk IDs replaced with their original values.
Text processing tools like markdown and pandoc are designed to turn
human-friendly markup into common output formats like HTML. This works well
for most prose, but components that generate their own HTML may break if
their markup is interpreted as the input language. The htmlPreserve
function is used to mark regions of an input document as containing pure HTML
that must not be modified. This is achieved by substituting each such region
with a benign but unique string before processing, and undoing those
substitutions after processing.
# htmlPreserve will prevent "<script>alert(10*2*3);</script>"
# from getting an <em> tag inserted in the middle
markup <- paste(sep = "\n",
"This is *emphasized* text in markdown.",
htmlPreserve("<script>alert(10*2*3);</script>"),
"Here is some more *emphasized text*."
)
extracted <- extractPreserveChunks(markup)
markup <- extracted$value
# Just think of this next line as Markdown processing
output <- gsub("\\*(.*?)\\*", "<em>\\1</em>", markup)
output <- restorePreserveChunks(output, extracted$chunks)
output
#> [1] "This is <em>emphasized</em> text in markdown.\n<script>alert(10*2*3);</script>\nHere is some more <em>emphasized text</em>."