Skip to contents

tidywikidatar (development version)

tidywikidatar 0.5.9

  • drop dependency on WikipediR
  • report informative error message from the api in tw_search and related functions
  • include “tidywikidatar/version” user agent in tw_search and related functions
  • fix tw_get_wikipedia_section_links when url provided as input

tidywikidatar 0.5.8

CRAN release: 2024-03-28

  • consistent results rather than error when searching empty string with tw_search
  • consistent results rather than error when searching empty string with tw_get_property_label
  • drop usethis dependency (PR by @olivroy)

tidywikidatar 0.5.7

CRAN release: 2023-03-10

  • fix: tw_get_wikipedia_category_members now works with categories that have more than 1000 member pages and consistently stores data in language-appropriate cache file also when language is derived from url
  • testing adjustments: more tests now rely on an embedded set of items to reduce risks of server timeouts when conducting checks, in particular from CRAN

tidywikidatar 0.5.6

CRAN release: 2023-01-31

  • fix: tw_get now caches Wikidata items that have been deleted, storing the error message as the value of property “error”
  • fix: tw_search now returns results consistently also when description missing in given language (issue stemming from update in a dependency)

tidywikidatar 0.5.5

CRAN release: 2022-10-31

  • fix: tw_search now checks cache efficiently also when cache settings are passed as parameters
  • fix: minor adjustments to prevent warnings and error with latest purrr and tidyselect

tidywikidatar 0.5.4

CRAN release: 2022-08-06

tidywikidatar 0.5.3

CRAN release: 2022-06-07

  • new feature: get identifiers of pages that are members of a category on Wikipedia with tw_get_wikipedia_category_members()
  • fix: return a data frame of a single row filled with NA values when tw_get_qualifiers() would return no information on a given item/property combination; this ensures caching works as expected also in such cases.
  • fix: error in caching functions apparently due to dbplyr update/incompatibility with the pool package

tidywikidatar 0.5.2

CRAN release: 2022-05-18

  • new feature: tw_get_p_wide() offers a more efficient way to get wide data frames with a number of properties (with relative labels) for a set of Wikidata id; it also facilitates collapsing values for ease of export to csv.
  • new: introduce tw_get_p1() as a shorthand for tw_get_p(only_first = TRUE, preferred = TRUE)
  • new: when pre-cached data is given to functions such as tw_get_property() or tw_get_label(), it looks for missing data if not all are present in given data frame
  • new: introduce cache indexing convenience functions, tw_index_cache_item() and tw_index_cache_search(). These should be significantly improved performance on the caching backend, but these are currently introduced as an opt-in as they have not been thoroughly tested with different database drivers. Running them once drastically improve performance with MySql when the cache goes into million of items. See also the vignette on caching.
  • fix: don’t look for cache folder when cache disabled in tw_get_p()
  • fix: better handling of cases when NA (including, a vector with only NA), or NULL, are given as input to various functions
  • fix: when only a partial pre-cached data frame is passed to tw_get() and other functions as input, missing items are queried
  • fix: tw_get_wikipedia() consistently returns vector of the same length as input
  • fix: rank column mistakenly dropped in some cases when using tw_get_property()
  • fix: when caching language is set manually via parameter (instead of via environment variable), a separate SQLite database is used, as expected
  • new: tw_check_qid() has more parameters to deal with different use cases
  • new: new set of functions to add index to databases to improve performance when the local cache grows to many millions rows (tw_check_cache_index(), tw_index_cache_item(), and tw_index_cache_search()). Tested with MySql.
  • all _single functions used internally are now not exported to facilitate auto-complete
  • new maintainer email address due to CRAN issues

tidywikidatar 0.5.1

CRAN release: 2022-03-11

  • introduce functions to get list of sections of Wikipedia pages - tw_get_wikipedia_page_sections() - and then extract the links from a specific section - tw_get_wikipedia_page_section_links()
  • introduce convenience function to get all Wikidata items that have a given property, irrespective of the value, tw_get_all_with_p()
  • tw_search() now has separate parameters for the search language, and the language in which label and description and returned (previously, these were always in English)
  • fix tw_get_qualifiers() when qualifier value is of type quantity
  • keep smooth caching also when a Wikidata item has no values and no label in the current language
  • introduce additional settings in database connections for drivers that need them
  • minor bug fixes

tidywikidatar 0.5.0

CRAN release: 2022-01-26

  • new vignette with examples starting from a Wikipedia page, with more examples using piped operations, and clearer references that separate group of functions in official documentation website
  • tw_get() now keeps rank by default, facilitating retrieving more relevant results
  • tw_get_qualifiers() now includes ranking such as “preferred”, “normal”, and “deprecated” associated with each property, as well as value type of the output (new format incompatible with previous cache, reset cache with tw_reset_qualifiers_cache())
  • tw_get_qualifiers() not returns correctly value when qualifier value type is a string (not a Wikidata identifier, not a date)
  • tw_get_image() not respects all parameters consistently. tw_get_image() and tw_get_image_metadata() are now briefly described in the README.
  • tw_get_wikipedia_page_links() now caches results, and provides more consistent results as a data frame
  • introduce tw_get_p() as an alias of tw_get_property_same_length() for brevity
  • introduce new parameters in tw_get_p() to deal with common pattern when only “preferred” or most recent property should be returned, rather than whatever Wikidata has first in the list, and add new section in the readme
  • add example datasets for illustrative purposes and forthcoming additional vignettes and examples, tw_qid_meps and tw_qid_airports
  • drop legacy include_id_and_p parameter tw_get_qualifiers()
  • add support for setting database connection parameters with tw_set_cache_db() for easier use of alternatives to SQLite
  • new naming convention for local SQLite database, to reflect change that there is now one database per language
  • improve handling of connections to reduce the risk of connections remaining open, aiming at higher efficiency for integration with shiny apps
  • bug fix: fix error when Wikidata item has no label in any language

tidywikidatar 0.4.2

CRAN release: 2021-09-14

  • tw_get_image() now returns consistently valid links if format is set to ‘embed’; it is now possible to get a direct link to images with a given resolution with the width parameter
  • introduce tw_get_image_metadata() to obtain adequate credits to be included when images are used
  • introduce functions to get Wikidata identifiers from Wikipedia pages
  • tw_get_property() now consistently returns data frame with properties in the same order as given and does not fail if given invalid values
  • more consistent testing, less likely to be impacted to changes in Wikidata items
  • updated README with examples using the pipe operator

tidywikidatar 0.4.1

CRAN release: 2021-08-04

  • bug fix release: tw_get_label(), now actually always returns vector of the same length as input
  • new functions used internally for consistency and preventing the above issue under different scenarios
  • invalid identifiers and NA are now ignored by tw_get()

tidywikidatar 0.4.0

CRAN release: 2021-07-31

tidywikidatar 0.3.0

CRAN release: 2021-06-07

  • much more efficient caching
  • introduces partial compatibility with other caching backends via DBI
  • introduces progress bars
  • report informative error messages from server
  • tests fail without throwing errors if Wikidata server cannot be reached
  • new caching vignette

tidywikidatar 0.2.0

CRAN release: 2021-04-19

  • more efficient when input includes repeated id
  • better documentation and testing

tidywikidatar 0.1.0

  • all outputs as data frame or vector
  • caches data locally
  • facilitates basic queries