Use "magic" HTML comments to protect regions of HTML from being modified by text processing tools.



restorePreserveChunks(strval, chunks)



A character vector of HTML to be preserved.


Input string from which to extract/restore chunks.


The chunks element of the return value of extractPreserveChunks.


htmlPreserve returns a single-element character vector with "magic" HTML comments surrounding the original text (unless the original text was empty, in which case an empty string is returned).

extractPreserveChunks returns a list with two named elements: value is the string with the regions replaced, and chunks is a named character vector where the names are the IDs and the values are the regions that were extracted.

restorePreserveChunks returns a character vector with the chunk IDs replaced with their original values.


Text processing tools like markdown and pandoc are designed to turn human-friendly markup into common output formats like HTML. This works well for most prose, but components that generate their own HTML may break if their markup is interpreted as the input language. The htmlPreserve function is used to mark regions of an input document as containing pure HTML that must not be modified. This is achieved by substituting each such region with a benign but unique string before processing, and undoing those substitutions after processing.


# htmlPreserve will prevent "<script>alert(10*2*3);</script>"
# from getting an <em> tag inserted in the middle
markup <- paste(sep = "\n",
  "This is *emphasized* text in markdown.",
  "Here is some more *emphasized text*."
extracted <- extractPreserveChunks(markup)
markup <- extracted$value
# Just think of this next line as Markdown processing
output <- gsub("\\*(.*?)\\*", "<em>\\1</em>", markup)
output <- restorePreserveChunks(output, extracted$chunks)
#> [1] "This is <em>emphasized</em> text in markdown.\n<script>alert(10*2*3);</script>\nHere is some more <em>emphasized text</em>."