tagQuery()
provides a jQuery inspired interface to query and
modify HTML fragments in R. Learning how to use tagQuery()
directly gives you a very fast and flexible way to extract and modify
HTML tags. Some other htmltools functions like
tagAppendAttributes()
actually use tagQuery()
when .cssSelector
is supplied, but only offer a subset of
the functionality the tagQuery()
API provides, and can be
significantly slower when multiple modifications are needed (for
details, see performance).
To create a tagQuery()
object, pass it either a
tag()
(e.g., div()
) or
tagList()
:
library(htmltools)
tagQuery(div(a()))
#> `$allTags()`:
#> <div>
#> <a></a>
#> </div>
#>
#> `$selectedTags()`: `$allTags()`
Notice how tagQuery()
tracks two essential pieces: the
input tag(s) as well as selected tags (by default the input
tag(s) are selected). This data structure allows us to efficiently query, modify, and replace particular
fragments of the root HTML tag.
Since tagQuery()
isn’t itself a tag()
object, it can’t be passed directly to tag()
or tag
rendering functions, but at any given time you can extract
$allTags()
or the $selectedTags()
.
tagQuery()
has numerous methods to select (i.e., query)
HTML tag(s). Every query method accepts a CSS
selector for targeting particular tags of interest. At the moment,
tagQuery()
only supports a combination of type
(e.g, div
), class
(e.g., .my-class
), id
(e.g., #myID
), and universal
(*
) selectors within a given simple
selector.
To begin querying tags, start with either $find()
or
$children()
. The former traverses all descendants
whereas the latter only considers direct descendants.
(html <- div(span("foo"), div(span("bar"))))
#> <div>
#> <span>foo</span>
#> <div>
#> <span>bar</span>
#> </div>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("span")$selectedTags()
#> [[1]]
#> <span>foo</span>
#>
#> [[2]]
#> <span>bar</span>
tagQ$find("span")$length()
#> [1] 2
tagQ$children("span")$selectedTags()
#> [[1]]
#> <span>foo</span>
tagQ$children("span")$length()
#> [1] 1
And since $find()
considers all descendants, it allows
for descendant
selectors (space) and direct child
selectors (>).
tagQ <- tagQuery(html)
tagQ$find("div a")$selectedTags()
#> [[1]]
#> <a></a>
tagQ$find("div > a")$selectedTags()
#> named list()
tagQ$find("div > span > a")$selectedTags()
#> [[1]]
#> <a></a>
Since tagQuery()
methods may be chained together, you
could also implement tagQ$find("div > span > a")
as:
tagQ$find("div")$children("span")$children("a")$selectedTags()
#> [[1]]
#> <a></a>
Although tagQuery()
doesn’t (currently) support sibling
selectors (+
and ~
), it does provide a
$sibling()
method, which provides essentially the same
functionality:
tagQ <- tagQuery(html)
# The moral equivalent to `tagQ$find("a ~ span")`
tagQ$find("a")$siblings("span")$selectedTags()
#> [[1]]
#> <span></span>
In some cases, after finding children, it can be useful to traverse
back up the tag tree to find particular ancestors of a selection.
Similar to the difference in $find()
and
$children()
, $parents()
traverses all
ancestors whereas $parent()
considers just direct
ancestors.
(html <- div(div(a(class = "foo")), span(a())))
#> <div>
#> <div>
#> <a class="foo"></a>
#> </div>
#> <span>
#> <a></a>
#> </span>
#> </div>
tagQ <- tagQuery(html)
tagQ$find("a.foo")$parent()$selectedTags()
#> [[1]]
#> <div>
#> <a class="foo"></a>
#> </div>
tagQ$find("a.foo")$parents()$selectedTags()
#> [[1]]
#> <div>
#> <a class="foo"></a>
#> </div>
#>
#> [[2]]
#> <div>
#> <div>
#> <a class="foo"></a>
#> </div>
#> <span>
#> <a></a>
#> </span>
#> </div>
The $filter()
method can be used to subset selected tags
using an R function or CSS selector. When combined with the universal
selector (*
), $filter()
is particularly useful
as a workaround for the fact that tagQuery()
doesn’t fully
support the entire CSS selector specification. For example, here’s a
workaround for tagQuery()
’s current lack of support for attribute
selectors:
(html <- div(div(), div("data-foo" = "bar")))
#> <div>
#> <div></div>
#> <div data-foo="bar"></div>
#> </div>
tagQ <- tagQuery(html)
# The moral equivalent to `tagQ$find("[data-foo]")`
tagQ$
find("*")$
filter(function(x, i) tagHasAttribute(x, "data-foo"))$
selectedTags()
#> [[1]]
#> <div data-foo="bar"></div>
tagQuery()
provides numerous functions for modifying
HTML attributes, children, or sibling tags of the current query selection.
Unlike query methods, modifier methods modify their input (both
the root and the selection). For example, note how the
$addClass()
call here modifies tagQ
(but
$find()
doesn’t):
tagQ <- tagQuery(html)
tagQ$
find("a")$
addClass("foo")$
allTags()
#> <div>
#> <a class="foo"></a>
#> </div>
The mutable behavior of modifier methods not only allows us to modify child tags without losing a reference to the root tag, but it also makes modifications more performant than they’d otherwise be.
Use $addAttrs()
to add and $removeAttrs()
to remove any HTML attribute from each selected tag. If you’re just
working with class
attributes, consider using the more
convenient $addClass()
, $removeClass()
, or
$toggleClass()
(html <- div(span(a()), span()))
#> <div>
#> <span>
#> <a></a>
#> </span>
#> <span></span>
#> </div>
tagQ <- tagQuery(html)
# Equivalent to tagAppendAttributes(html, .cssSelector = "span", "data-bar" = "foo")
tagQ$
find("span")$
addAttrs("data-bar" = "foo")$
allTags()
#> <div>
#> <span data-bar="foo">
#> <a></a>
#> </span>
#> <span data-bar="foo"></span>
#> </div>
Also, to check whether each selected tag has a certain attribute, use
$hasAttrs()
(or $hasClass()
)
tagQ$find("span")$hasAttrs("data-bar")
#> [1] TRUE TRUE
Use $prepend()
to insert content before the children of
each selected tag and $append()
to insert content
after:
tagQ <- tagQuery(html)
# Equivalent to html %>% tagInsertChildren(.cssSelector = "p", after = 0, span()) %>% tagAppendChildren(.cssSelector = "p", tags$table())
tagQ$
find("p")$
prepend(span())$
append(tags$table())$
allTags()
#> <div>
#> <p>
#> <span></span>
#> <a></a>
#> <table></table>
#> </p>
#> </div>
If you’d like to replace all the children, then you can first call
$empty()
before $append()
. If you like to just
remove particular child tags, then you should call
$children()
+ the $remove()
sibling
method.
As with tagQuery()
’s modifier methods, its replace
methods also modify their input. They also empty selected tags, so you
may want to $resetSelection()
if you want to make more
queries or modifications after-the-fact.
Use $replaceWith()
to replace selected tags with some
other content.
Use $remove()
to replace selected tags with nothing:
tagQ <- tagQuery(html)
tagQ$find("a")$remove()$allTags()
#> <div></div>
And use $empty()
to replace the children of the
selected tags with nothing:
tagQ <- tagQuery(html)
tagQ$find("span")$empty()$allTags()
#> <div>
#> <span></span>
#> </div>
One main reason why tagQuery()
is fast is that it
converts the underlying tag()
list structure into a
environment (i.e., reference
object). As a result, tagQuery()
is able to keep a
reference to selected tags and modify them without having to re-find
tags for each modification. This is why, even if you can achieve
multiple modifications via multiple calls to
tagAppendAttributes()
, tagInsertChildren()
,
etc. (with a .cssSelector
), you should consider using
tagQuery()
directly instead.
For a basic example, since tagQuery()
can prepend and
append in one shot, it’s twice as fast as using
tagInsertChildren()
then tagAppendChildren()
(with the same .cssSelector
). Internally, the latter
approach calls tagQuery(html)$find()
twice, which is why
it’s 2 times slower.
library(magrittr)
html <- div(p(a()))
bench::mark(
tagQuery = tagQuery(html)$
find("p")$
prepend(span())$
append(tags$table())$
allTags(),
tagAppend = html %>%
tagInsertChildren(.cssSelector = "p", after = 0, span()) %>%
tagAppendChildren(.cssSelector = "p", tags$table()),
check = FALSE,
time_unit = "us"
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 tagQuery 703. 734. 1352. 11.4KB 19.3
#> 2 tagAppend 1278. 1335. 744. 50.5KB 17.9