The easiest way to extract tweets regualarly and deploy a shiny app that facilitates exploring them is to use a server and run services with Docker.

In order to facilitate the process, quotefinder comes with dedicated and customisable Docker containers.

Getting the tweets

After having a server running, with Docker enabled, and having created a user with the rights to run docker containers, you should create a folder, called qf_data. At this stage, the folder should contain three files:

  1. an R file, with the sole purpose of running a rmarkdown document. The full content of the file can be as follows rmarkdown::render(input = "/home/qf/qf_data/qf_get_tweets.Rmd"). See example.

  2. an rmarkdown file (see example). It will likely be a basic document, which may just include a call to qf_get_tweets_from_list() or similar function.

  3. a twitter token, stored e.g. as twitter_token.rds.

An rmarkdown file is, of course, not stricly speaking necessary, but I find it makes it much easier to deal with issues such as folder locations, and allows to detail subsequent steps in an ordered fashion, no matter how complicated the procedure at hand. The R file can of course include calls to more than one rmarkdown files if this feels easier to manage. If useful, rmarkdown output files can be stored locally as a form of logging.

Dockerised Rstudio with pre-loaded quotefinder

In order to check if everything is in order, you may want to have a look at your server in an interactive R session.

Run this command if you prefer to keep files in your user directory on the server.

Or run this command to use Docker volumes

N.B. keep qf as the user, but do change password! ies5EshaiPh6 is just a random string. You can change the port where to make available Rstudio by changing the first part of 28788:8787 in the command above (add -d if you want this rstudio session to continue running). The Dockerfile used to create this image is available on GitHub.

You can then access the rstudio session using the ip (or the domain name) associated with your host, like http://123.123.123.123:28788

In order to get some example files, you can download them to the qf_data folder by running these data in the newly opened Rstudio session in your browser.

download.file(url = "https://raw.githubusercontent.com/giocomai/quotefinder/master/inst/qf_get_tweets/qf_data/qf_get_tweets.R",
              destfile = "qf_data/qf_get_tweets.R")

download.file(url = "https://raw.githubusercontent.com/giocomai/quotefinder/master/inst/qf_get_tweets/qf_data/qf_get_tweets.Rmd",
              destfile = "qf_data/qf_get_tweets.Rmd")

download.file(url = "https://raw.githubusercontent.com/giocomai/quotefinder/master/inst/qf_get_tweets/qf_data/qf_pre_process_tweets.Rmd",
                destfile = "qf_data/qf_pre_process_tweets.Rmd")

download.file(url = "https://raw.githubusercontent.com/giocomai/quotefinder/master/inst/qf_get_tweets/qf_data/create_token.Rmd",
              destfile = "qf_data/create_token.Rmd")

If you already have a twitter token, you can copy it to the qf_data subfolder as twitter_token.rds. If not, now it’s the right time to create one. You can do so by opening the create_token.Rmd file that you should find in the qf_data folder.

At this stage, you can customise the qf_get_tweets.Rmd to download the tweets you are interested in.

Running the command in qf_get_tweets.R should process the rmarkdown file and store tweets locally.

You can use this interface to customise scripts, and troubleshoot any issues. If everything looks fine, you can proceed to the next section: having tweets updated regularly.

Have tweets updated regularly with cron

While in principle it is possible to have a constantly running docker container, with cron running inside it, it is more advisable (and more efficient) to let the host machine handle the cron part, and start up the relevant docker container, have it update tweets, and then stop immediately.

First, let’s run the docker container tasked with updating tweets in an interactive session to check again if everything is running smoothly.

All good? What about having the host updating tweets every hour?

From the terminal on your host, run:

At the end of the document, add:

or, if you want to have a logfile for troubleshooting:

Leave an empty line at the end the crontab, save, and… you’re done. This will update tweets every time an hour starts (e.g. at 8.00, at 9.00, etc.).

If you are unfamiliar with cron, in order to customise frequency and time consider using cronR.