You need to use double quotation marks to search exact strings. For example, if your query is 'cats and dogs', you will need to format this as '"cats and dogs"'
in config.yml
Yes - just ensure that your strings are separated by commas and enclosed within square brackets, like so:
['cats OR dogs', 'kittens and puppies', 'chickens and chicks']
DATA will run a separate search for each query, and will upload all data to the dataset specified in config.py
No - if you do not want any data processing done, Twarc is your best option. This tool uses the Twarc library to collect Twitter data, but is designed to process these files and store them in Google BigQuery.
This tool was designed specifically for researchers who requested a largely code-free option for streamlining the gathering, processing and storage of data from Twitter. For this reason, the config is set outside of the command line.
Yes, you can move all but the newest file in your collection directory to a backup location. If you have to stop and restart DATA, and you have moved some files from your collection, make sure to update the start date in config.yml, so that you don't accidentally re-collect the data you have backed up.
You can search for anything, as long as it meets Twitter's rules for How to Build a Query.
According to Twitter, the counts endpoint is not subject to the same compliance checks as the search endpoint. For this reason, your counts are likely to be a bit higher than what you actually gather.
I lost my connection/Windows forced an update/something went wrong during processing/uploading to BigQuery. Do I have to re-collect these files?
If you've had an external issue, and this issue occurred during the processing or uploading stage, you can move your collected json files from DATA_collector/my_collections/your_directory/collected_json
to DATA_collector/json_input_files
. Run DATA again and select option 2 to process the files without re-collecting. If the issue was a program error, please create an issue and attach your log file.
Log files are located in DATA_collector/logging
.