Skip to contents

Core functions used to get data from Wikidata as data frames

tw_get()
Return (most) information from a Wikidata item in a tidy format
tw_get_property()
Get Wikidata property of one or more items as a tidy data frame
tw_get_qualifiers()
Get Wikidata qualifiers for a given property of a given item
tw_get_p_wide()
Efficiently get a wide table with various properties of a given set of Wikidata identifiers

Core functions that return character vectors of the same length as input

These are commonly used with piped operations.

tw_get_label()
Get Wikidata label in given language
tw_get_description()
Get Wikidata description in given language
tw_get_property_same_length() tw_get_p()
Get Wikidata property of an item as a vector or list of the same length as input
tw_get_p1()
Get Wikidata property of an item as a character vector of the same length as input
tw_get_property_label()
Get label of a Wikidata property in a given language
tw_get_property_description()
Get description of a Wikidata property in a given language

Labelling

Besides the dedicated functions for labelling, tw_label() attempts to turn Q identifiers to labels in data frames such as the ones generated by the core functions listed above.

tw_label()
Gets labels for all columns with names such as "id" and "property".

Filters

These are convenience functions; the result is similar to using tw_get() and dplyr::filter(), but with less typing.

tw_filter()
Filter search result and keep only items with matching property and Q identifier
tw_filter_first()
Filter search result and keep only and keep only the first match
tw_filter_people()
Filter search result and keep only people

Functions to interact with Wikidata via Wikipedia

tw_get_wikipedia()
Get URL to a Wikipedia article corresponding to a Wikidata Q identifier in given language
tw_get_wikipedia_base_api_url()
Facilitates the creation of MediaWiki API base URLs
tw_get_wikipedia_category_members()
Get all Wikidata Q identifiers of all Wikipedia pages (or files, or subcategories) that are members of the given category,
tw_get_wikipedia_category_members_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_links()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in one or more pages
tw_get_wikipedia_page_links_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_qid()
Gets the Wikidata Q identifier of one or more Wikipedia pages
tw_get_wikipedia_page_qid_single()
Gets the Wikidata id of a Wikipedia page
tw_get_wikipedia_page_section_links()
Get links from a specific section of a Wikipedia page
tw_get_wikipedia_page_sections()
Get sections of a Wikipedia page
tw_get_wikipedia_page_sections_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_section_links_api_url()
Facilitates the creation of MediaWiki API base URLs to retrieve sections of a page
tw_get_wikipedia_sections_api_url()
Facilitates the creation of MediaWiki API base URLs to retrieve sections of a page

Search Wikidata

tw_search()
Search for Wikidata items or properties and return Wikidata id, label, and description.
tw_search_item()
Search for Wikidata properties in Wikidata and return Wikidata id, label, and description.
tw_search_property()
Search for Wikidata properties in Wikidata and return Wikidata id, label, and description.
tw_search_single()
Search for Wikidata items or properties and return Wikidata id, label, and description.
tw_query()
Perform simple Wikidata queries
tw_get_all_with_p()
Get all items that have a given property (irrespective of the value)

Retrieve information about images

Since images are not stored within Wikidata, but often relevant to those working with Wikidata, these functions wrap relevant APIs to facilitate getting data about images associated with a Wikidata Q identifier

tw_get_image()
Get image from Wikimedia Commons
tw_get_image_metadata()
Get metadata for images from Wikimedia Commons
tw_get_image_metadata_single()
Get metadata for images from Wikimedia Commons
tw_get_image_same_length()
Get image from Wikimedia Commons

Functions to retrieve more detailed data that are not (yet) cached by tidywikidatar

tw_get_property_with_details()
Gets all details of a property
tw_query()
Perform simple Wikidata queries

Cache settings

tw_create_cache_folder()
Creates the base cache folder where tidywikidatar caches data.
tw_enable_cache()
Enable caching for the current session
tw_disable_cache()
Disable caching for the current session
tw_set_cache_db()
Set database connection settings for the session
tw_set_cache_folder() tw_get_cache_folder()
Set folder for caching data
tw_set_language() tw_get_language()
Set language to be used by all functions

Functions for resetting the cache

After upgrading to a new tidywikidatar version, or in the unlikely case of database corruption, these can be used to delete specific cache tables. Consider deleting the whole cache if these do not help.

tw_reset_item_cache()
Reset qualifiers cache
tw_reset_qualifiers_cache()
Reset qualifiers cache
tw_reset_wikipedia_category_members_cache()
Reset Wikipedia category members cache
tw_reset_wikipedia_page_cache()
Reset Wikipedia page cache
tw_reset_wikipedia_page_links_cache()
Reset Wikipedia page link cache
tw_reset_wikipedia_page_sections_cache()
Reset Wikipedia page link cache

Functions used internally for dealing with cache

They are exported for easier access to advanced users, but should be disregarded by most.

tw_connect_to_cache()
Return a connection to be used for caching
tw_check_cache()
Check caching status in the current session, and override it upon request
tw_check_cache_folder()
Checks if cache folder exists, if not returns an informative message
tw_check_cached_items()
Check if given items are present in cache
tw_disconnect_from_cache()
Ensure that connection to cache is disconnected consistently
tw_check_cache_index()
Check if cache table is indexed
tw_index_cache_search()
Add index to caching table for search queries for increased speed
tw_index_cache_item()
Add index to caching table for search queries for increased speed
tw_get_cache_db()
Get database connection settings from the environment
tw_get_cache_file()
Gets location of cache file
tw_get_cache_table_name()
Gets name of table inside the database
tw_get_cached_item()
Retrieve cached item
tw_get_cached_qualifiers()
Retrieve cached qualifier
tw_get_cached_search()
Retrieve cached search
tw_get_cached_wikipedia_category_members()
Gets members of Wikipedia categories from local cache
tw_get_cached_wikipedia_page_links()
Gets links of Wikipedia pages from local cache
tw_get_cached_wikipedia_page_qid()
Gets id of Wikipedia pages from local cache
tw_get_cached_wikipedia_page_sections()
Gets sections of Wikipedia pages from local cache
tw_set_cache_folder() tw_get_cache_folder()
Set folder for caching data
tw_write_item_to_cache()
Writes item to cache
tw_write_qid_of_wikipedia_page_to_cache()
Write Wikidata identifier (qid) of Wikipedia page to cache
tw_write_qualifiers_to_cache()
Write qualifiers to cache
tw_write_search_to_cache()
Writes search to cache
tw_write_wikipedia_category_members_to_cache()
Write Wikipedia page links to cache
tw_write_wikipedia_page_links_to_cache()
Write Wikipedia page links to cache
tw_write_wikipedia_page_sections_to_cache()
Write Wikipedia page links to cache

Functions used internally to check for validity of data

They are exported for easier access to advanced users, but should be disregarded by most.

tw_check_qid()
Ensures that input appears to be a valid Wikidata id
tw_check_pid()
Ensures that input appears to be a valid Wikidata property id (i.e. it starts with P and is followed only by digits)
tw_check_search()
Checks if an input is a search; if not, it tries to return a search

Extractors

These functions are mostly used internally to extract selected data to an object previously downloaded with WikidataR

tw_extract_qualifier()
Extract qualifiers from an object of class Wikidata created with WikidataR
tw_extract_single()
Extract item data from an object of class Wikidata created with WikidataR
tw_get_field()
Gets a field such a label or description from a dataframe typically generated with tw_get()

Functions made to process only inputs of length one, used internally

The user can safely rely on the correspondent core functions, these have been created separetely mostly for testing and code structure.

tw_extract_single()
Extract item data from an object of class Wikidata created with WikidataR
tw_get_image_metadata_single()
Get metadata for images from Wikimedia Commons
tw_get_property_label_single()
Get label of a Wikidata property in a given language
tw_get_property_with_details_single()
Gets all details of a property
tw_get_qualifiers_single()
Get Wikidata qualifiers for a given property of a given item
tw_get_single()
Return (most) information from a Wikidata item in a tidy format from a single Wikidata identifier
tw_get_wikipedia_category_members_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_links_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_get_wikipedia_page_qid_single()
Gets the Wikidata id of a Wikipedia page
tw_get_wikipedia_page_sections_single()
Get all Wikidata Q identifiers of all Wikipedia pages that appear in a given page
tw_search_single()
Search for Wikidata items or properties and return Wikidata id, label, and description.

Reference data frames

These are empty data frames (with zero rows), mostly used internally when a function does not have relevant results to return. They can also serve as reference data frames to know what to expect from a given function or as placeholders.

tw_empty_image_metadata
A zero-rows tibble used internally when tw_get_image_metadata() would not return any value.
tw_empty_item
A zero-rows tibble used internally when tw_get() would not return any value.
tw_empty_qualifiers
A zero-rows tibble used internally when tw_get_qualifiers() would not return any value.
tw_empty_search
A zero-rows tibble used internally when tw_search() would not return any value.
tw_empty_wikipedia_category_members
A zero-rows tibble used internally when tw_empty_wikipedia_category_members() would not return any value.
tw_empty_wikipedia_page
A zero-rows tibble used internally when tw_get_wikipedia_page_qid() would not return any value.
tw_empty_wikipedia_page_links
A zero-rows tibble used internally when tw_get_wikipedia_page_links() would not return any value.
tw_empty_wikipedia_page_sections
A zero-rows tibble used internally when tw_get_wikipedia_page_sections() would not return any value.

Example datasets

Just a couple of datasets with a bunch of Wikidata Q identifiers to be used for examples or testing

tw_qid_airports
The Wikidata Q identifier of European airports found in Eurostat's avia_par_ dataset
tw_qid_meps
The Wikidata Q identifier of all members of the European Parliament since its establishment
tw_test_items
A list mostly used for testing with some Wikidata items in the format resulting from WikidataR::get_item()