tagQuery() provides a jQuery inspired interface to query and modify HTML fragments in R. Learning how to use tagQuery() directly gives you a very fast and flexible way to extract and modify HTML tags. Some other htmltools functions like tagAppendAttributes() actually use tagQuery() when .cssSelector is supplied, but only offer a subset of the functionality the tagQuery() API provides, and can be significantly slower when multiple modifications are needed (for details, see performance).

To create a tagQuery() object, pass it either a tag() (e.g., div()) or tagList():

library(htmltools)
tagQuery(div(a()))
#> `$allTags()`:
#> <div>
#>   <a></a>
#> </div>
#> 
#> `$selectedTags()`: `$allTags()`

Notice how tagQuery() tracks two essential pieces: the input tag(s) as well as selected tags (by default the input tag(s) are selected). This data structure allows us to efficiently query, modify, and replace particular fragments of the root HTML tag.

Since tagQuery() isn’t itself a tag() object, it can’t be passed directly to tag() or tag rendering functions, but at any given time you can extract $allTags() or the $selectedTags().

Query

tagQuery() has numerous methods to select (i.e., query) HTML tag(s). Every query method accepts a CSS selector for targeting particular tags of interest. At the moment, tagQuery() only supports a combination of type (e.g, div), class (e.g., .my-class), id (e.g., #myID), and universal (*) selectors within a given simple selector.

Children

To begin querying tags, start with either $find() or $children(). The former traverses all descendants whereas the latter only considers direct descendants.

(html <- div(span("foo"), div(span("bar"))))
#> <div>
#>   <span>foo</span>
#>   <div>
#>     <span>bar</span>
#>   </div>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("span")$selectedTags()
#> [[1]]
#> <span>foo</span>
#> 
#> [[2]]
#> <span>bar</span>
tagQ$find("span")$length()
#> [1] 2
tagQ$children("span")$selectedTags()
#> [[1]]
#> <span>foo</span>
tagQ$children("span")$length()
#> [1] 1

And since $find() considers all descendants, it allows for descendant selectors (space) and direct child selectors (>).

(html <- div(div(span(a()))))
#> <div>
#>   <div>
#>     <span>
#>       <a></a>
#>     </span>
#>   </div>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("div a")$selectedTags()
#> [[1]]
#> <a></a>
tagQ$find("div > a")$selectedTags()
#> named list()
tagQ$find("div > span > a")$selectedTags()
#> [[1]]
#> <a></a>

Since tagQuery() methods may be chained together, you could also implement tagQ$find("div > span > a") as:

tagQ$find("div")$children("span")$children("a")$selectedTags()
#> [[1]]
#> <a></a>

Siblings

Although tagQuery() doesn’t (currently) support sibling selectors (+ and ~), it does provide a $sibling() method, which provides essentially the same functionality:

(html <- div(a(), span(), p()))
#> <div>
#>   <a></a>
#>   <span></span>
#>   <p></p>
#> </div>
tagQ <- tagQuery(html)
# The moral equivalent to `tagQ$find("a ~ span")`
tagQ$find("a")$siblings("span")$selectedTags()
#> [[1]]
#> <span></span>

Parents

In some cases, after finding children, it can be useful to traverse back up the tag tree to find particular ancestors of a selection. Similar to the difference in $find() and $children(), $parents() traverses all ancestors whereas $parent() considers just direct ancestors.

(html <- div(div(a(class = "foo")), span(a())))
#> <div>
#>   <div>
#>     <a class="foo"></a>
#>   </div>
#>   <span>
#>     <a></a>
#>   </span>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("a.foo")$parent()$selectedTags()
#> [[1]]
#> <div>
#>   <a class="foo"></a>
#> </div>
tagQ$find("a.foo")$parents()$selectedTags()
#> [[1]]
#> <div>
#>   <a class="foo"></a>
#> </div>
#> 
#> [[2]]
#> <div>
#>   <div>
#>     <a class="foo"></a>
#>   </div>
#>   <span>
#>     <a></a>
#>   </span>
#> </div>

Filter

The $filter() method can be used to subset selected tags using an R function or CSS selector. When combined with the universal selector (*), $filter() is particularly useful as a workaround for the fact that tagQuery() doesn’t fully support the entire CSS selector specification. For example, here’s a workaround for tagQuery()’s current lack of support for attribute selectors:

(html <- div(div(), div("data-foo" = "bar")))
#> <div>
#>   <div></div>
#>   <div data-foo="bar"></div>
#> </div>
tagQ <- tagQuery(html)
# The moral equivalent to `tagQ$find("[data-foo]")`
tagQ$
  find("*")$
  filter(function(x, i) tagHasAttribute(x, "data-foo"))$
  selectedTags()
#> [[1]]
#> <div data-foo="bar"></div>

Reset

To reset the set of selected tags to the root tag, use $resetSelected():

(html <- div(a()))
#> <div>
#>   <a></a>
#> </div>
tagQ <- tagQuery(html)$find("a")
tagQ$selectedTags()
#> [[1]]
#> <a></a>
tagQ$resetSelected()$selectedTags()
#> [[1]]
#> <div>
#>   <a></a>
#> </div>

Modify

tagQuery() provides numerous functions for modifying HTML attributes, children, or sibling tags of the current query selection. Unlike query methods, modifier methods modify their input (both the root and the selection). For example, note how the $addClass() call here modifies tagQ (but $find() doesn’t):

(html <- div(a()))
#> <div>
#>   <a></a>
#> </div>
tagQ <- tagQuery(html)
tagQ$
  find("a")$
  addClass("foo")$
  allTags()
#> <div>
#>   <a class="foo"></a>
#> </div>

The mutable behavior of modifier methods not only allows us to modify child tags without losing a reference to the root tag, but it also makes modifications more performant than they’d otherwise be.

Attributes

Use $addAttrs() to add and $removeAttrs() to remove any HTML attribute from each selected tag. If you’re just working with class attributes, consider using the more convenient $addClass(), $removeClass(), or $toggleClass()

(html <- div(span(a()), span()))
#> <div>
#>   <span>
#>     <a></a>
#>   </span>
#>   <span></span>
#> </div>
tagQ <- tagQuery(html)
# Equivalent to tagAppendAttributes(html, .cssSelector = "span", "data-bar" = "foo")
tagQ$
  find("span")$
  addAttrs("data-bar" = "foo")$
  allTags()
#> <div>
#>   <span data-bar="foo">
#>     <a></a>
#>   </span>
#>   <span data-bar="foo"></span>
#> </div>

Also, to check whether each selected tag has a certain attribute, use $hasAttrs() (or $hasClass())

tagQ$find("span")$hasAttrs("data-bar")
#> [1] TRUE TRUE

Children

Use $prepend() to insert content before the children of each selected tag and $append() to insert content after:

(html <- div(p(a())))
#> <div>
#>   <p>
#>     <a></a>
#>   </p>
#> </div>
tagQ <- tagQuery(html)
# Equivalent to html %>% tagInsertChildren(.cssSelector = "p", after = 0, span()) %>% tagAppendChildren(.cssSelector = "p", tags$table())
tagQ$
  find("p")$
  prepend(span())$
  append(tags$table())$
  allTags()
#> <div>
#>   <p>
#>     <span></span>
#>     <a></a>
#>     <table></table>
#>   </p>
#> </div>

If you’d like to replace all the children, then you can first call $empty() before $append(). If you like to just remove particular child tags, then you should call $children() + the $remove() sibling method.

Siblings

Use $before() to insert content before each selected tag and $after() to insert content after:

(html <- div(p(a())))
#> <div>
#>   <p>
#>     <a></a>
#>   </p>
#> </div>
tagQ <- tagQuery(html)
tagQ$
  find("a")$
  before(span())$
  after(tags$table())$
  allTags()
#> <div>
#>   <p>
#>     <span></span>
#>     <a></a>
#>     <table></table>
#>   </p>
#> </div>

Replace

As with tagQuery()’s modifier methods, its replace methods also modify their input. They also empty selected tags, so you may want to $resetSelection() if you want to make more queries or modifications after-the-fact.

Use $replaceWith() to replace selected tags with some other content.

(html <- div(a()))
#> <div>
#>   <a></a>
#> </div>
tagQ <- tagQuery(html)
tagQ$
  find("a")$
  replaceWith(p())$
  allTags()
#> <div>
#>   <p></p>
#> </div>

Use $remove() to replace selected tags with nothing:

tagQ <- tagQuery(html)
tagQ$find("a")$remove()$allTags()
#> <div></div>

And use $empty() to replace the children of the selected tags with nothing:

(html <- div(span(a())))
#> <div>
#>   <span>
#>     <a></a>
#>   </span>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("span")$empty()$allTags()
#> <div>
#>   <span></span>
#> </div>

Performance

One main reason why tagQuery() is fast is that it converts the underlying tag() list structure into a environment (i.e., reference object). As a result, tagQuery() is able to keep a reference to selected tags and modify them without having to re-find tags for each modification. This is why, even if you can achieve multiple modifications via multiple calls to tagAppendAttributes(), tagInsertChildren(), etc. (with a .cssSelector), you should consider using tagQuery() directly instead.

For a basic example, since tagQuery() can prepend and append in one shot, it’s twice as fast as using tagInsertChildren() then tagAppendChildren() (with the same .cssSelector). Internally, the latter approach calls tagQuery(html)$find() twice, which is why it’s 2 times slower.

library(magrittr)
html <- div(p(a()))
bench::mark(
  tagQuery = tagQuery(html)$
    find("p")$
    prepend(span())$
    append(tags$table())$
    allTags(),
  tagAppend = html %>%
    tagInsertChildren(.cssSelector = "p", after = 0, span()) %>%
    tagAppendChildren(.cssSelector = "p", tags$table()),
  check = FALSE,
  time_unit = "us"
)
#> # A tibble: 2 × 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl> <bch:byt>    <dbl>
#> 1 tagQuery    703.   734.     1352.    11.4KB     19.3
#> 2 tagAppend  1278.  1335.      744.    50.5KB     17.9