TSA is a command line application which acquires Tweets from a list of Tweet IDs using distributed Tor circuits. This application is essentially provides command line access to features of the nitscrape library, which contains all the actual network management and scraping code.
TSA was used to obtain 22 006 475 Tweets relating to Brexit at a maximum download rate of 1200 tweets per minute, from the period of 2016-2020. A breakdown of common key words over time can be viewed in the words.csv file. The ignore.csv file contains all the words that were ignored when compiling the key words list.
words.csv is formated to be more readable by aligning columns and using more readable syntax for the numbers. Each word is specified as follows: <word> (<n>|<r>) where n is the total number of occurrences of that word in the given 4 day period and r = n/total tweets in period, so the number of occurrences of the word per Tweet about Brexit in the 4 day period.
If the data is to be processed, words_raw.csv is preferable since it is formated for more efficient parsing. Each item is specified as: <word>;<n>;<r>, with n and r being as defined above.
The tweet_count field specifies the total number of Tweets about Brexit in the given period.