In the case of free tools to analyze and collect data on the main social platforms, there are tools for Facebook, Twitter, WhatsApp and YouTube (it should be noted that there are also tools for new platforms like TikTok): Crowdtangle (the best way to get data from Facebook), Orange, Socialblade, Gephi (to build graphs) and DMI Tools (gives access to data from various sources such as Wikipedia, YouTube, etc). If you are an independent researcher and/or do not have programming skills, it is recommended to use Orange, DMI Tools and Gephi, to monitor the debate, especially on Twitter; if you want to deepen the investigation and use new methodologies, you need to use Python to have more control of what happens.
Regarding Facebook, it does not have API (Application Programming Interface), access to the data that social media companies grant without having to reserve it in the researchers’ own database. Facebook disabled this option after the Cambridge Analytica scandal, now they grant access to their data only to researchers using the Crowdtangle tool, which requires the institution to be approved (it can take from 2 days to 1 week approximately). Crowdtangle also gives access to data from Reddit and Instagram, in the case of Facebook it does not give data from comments.
In the case of Twitter, access to data is much more open, you have access to historical data and data in real time, but these have limits, for example, with the new API that they are developing, API V2, you only have access to 500,000 tweets per month; likewise, you must apply to gain access.
In the case of YouTube it is different, the API uses different quotas to give access to data. There are some free tools, but the application for permission is reviewed automatically, you can develop your own tools, but they give low levels in the first moments, they can revoke your access if they believe that the research is not valuable to the public and/or the platform.
WhatsApp does not have an API, there is no way to access the data in an official way. There are ethical issues on this platform because you can access data from users who do not know that their information is being used in an investigation, this being the most complex aspect of the investigative process.
As for paid tools, the scope they offer on social media (most offer Twitter, Facebook and Instagram), the volume of data and the possibilities of exporting them (some have limits), access to historical data and analytics capabilities should be considered.
Regarding the research design, it is a step that must be taken before collecting the data. The steps are 1) Define the topic, 2) Explore discourses in courses, 3) Identify strategies, 4) Write the research question (“query”), 5) Validate the research question, and 6) Implement the research question.
In the first step, it is defined what to look for when monitoring, for which research questions, hypotheses, objectives, etc. are constructed. Topics can be events (e.g. elections); topics (e.g. hate speech debates on the internet), actors (e.g. candidates, activists, authorities, companies, state institutions, etc.). Sometimes a category of three must be included in the design, and each can lead to higher topics and subtopics that must also be analyzed.
The second step is to explore the discourse around the chosen topic, that is, what has been said about it. We need to focus on “where” these discourses are taking place and “who” is hosting them. Normally, pages are searched on the internet, such as institutional web pages, social media such as Facebook pages, media platforms.
The third step is to identify linguistic discourse strategies. It focuses on what form the speeches have: “what” is said and “how” what is related to the topic is said. Linguistic decisions must be made, for example, using words, phrases, proper names, slang, hashtags, accounts, etc. The variations that users can use and potential typing errors must be considered.
The fourth step is to write the query, which is a syntax, linguistic element of the investigation. The purpose is to transfer linguistic discourse strategies into syntax that will enable access to information. Search words are combined with logistic operators such as “and”, “or”, “not”, these represent the relationship between two terms. When constructing the syntax, morphological, lexical and semantic principles must be considered, as well as variations and other linguistic aspects; in addition to the moment of the speech and the moment of data collection.
The fifth step is about validating the query, it consists of using different tools to verify that the research objective is met, verify that no word, hashtag, etc. has been ignored, in addition to identifying superfluous data, which is not relevant for research, to think about including or excluding logical connectors. The query serves both to collect data and classify it.
The last step is to implement the query to collect and/or classify the data using different tools, whether paid or free.
Recommended reading
Department of Public Policy Analysis of Fundação Getulio Vargas. (2021). Hate Speech in Digital Environments. Recuperado de: https://democraciadigital.dapp.fgv.br/wp-content/uploads/2021/03/EN-Estudo-3-I-Discurso-de-Odio-Ficha-e-ISBN.pdf