Skip to contents

Gets the Wikidata Q identifier of one or more Wikipedia pages

Usage

tw_get_wikipedia_page_qid(
  url = NULL,
  title = NULL,
  language = tidywikidatar::tw_get_language(),
  cache = NULL,
  overwrite_cache = FALSE,
  cache_connection = NULL,
  disconnect_db = TRUE,
  wait = 1,
  attempts = 10
)

Arguments

url

A character vector with the full URL to one or more Wikipedia pages. If given, title and language can be left empty.

title

Title of a Wikipedia page or final parts of its url. If given, url can be left empty, but language must be provided.

language

Two-letter language code used to define the Wikipedia version to use. Defaults to language set with tw_set_language(); if not set, "en". If url given, this can be left empty.

cache

Defaults to NULL. If given, it should be given either TRUE or FALSE. Typically set with tw_enable_cache() or tw_disable_cache().

overwrite_cache

Logical, defaults to FALSE. If TRUE, it overwrites the table in the local sqlite database. Useful if the original Wikidata object has been updated.

cache_connection

Defaults to NULL. If NULL, and caching is enabled, tidywikidatar will use a local sqlite database. A custom connection to other databases can be given (see vignette caching for details).

disconnect_db

Defaults to TRUE. If FALSE, leaves the connection to cache open.

wait

In seconds, defaults to 1 due to time-outs with frequent queries. Time to wait between queries to the APIs. If data are cached locally, wait time is not applied. If you are running many queries systematically you may want to add some waiting time between queries.

attempts

Defaults to 10. Number of times it re-attempts to reach the API before failing.

Value

A a data frame with six columns, including qid with Wikidata identifiers, and a logical disambiguation to flag when disambiguation pages are returned.

Examples

if (interactive()) {
  tw_get_wikipedia_page_qid(title = "Margaret Mead", language = "en")

  # check when Wikipedia returns disambiguation page
  tw_get_wikipedia_page_qid(title = c("Rome", "London", "New York", "Vienna"))
}