Gender differences in online communication: A case study of Soccer (2024)

Mariana Macedo1,Akrati Saxena2

Abstract

Social media and digital platforms allow us to express our opinions freely and easily to a vast number of people. In this study, we examine whether there are gender-based differences in how communication happens via Twitter in regard to soccer. Soccer is one of the most popular sports, and therefore, on social media, it engages a diverse audience regardless of their technical knowledge. We collected Twitter data for three months (March-June) for English and Portuguese that contains 9.5 million Tweets related to soccer, and only 18.38% tweets were identified as belonging to women, highlighting a possible gender gap already in the number of people who participated actively in this topic. We then conduct a fine-grained text-level and network-level analysis to identify the gender differences that might exist while communicating on Twitter. Our results show that women express their emotions more intensely than men, regardless of the differences in volume. The network generated from Portuguese has lower hom*ophily than English. However, this difference in hom*ophily does not impact how females express their emotions and sentiments, suggesting that these aspects are inherent norms or characteristics of genders. Our study unveils more gaps through qualitative and quantitative analyses, highlighting the importance of examining and reporting gender gaps in online communication to create a more inclusive space where people can openly share their opinions.

Introduction

Social media platforms, such as Twitter (currently known as X), Instagram, and Reddit, have emerged as significant channels for self-expression. The versatility, convenience, and widespread accessibility of social media encourage people to express and garner reactions from their targeted audience. Fans, sports organizations, and players heavily rely on these platforms for interactive communication and to gather real-time information about ongoing events(Williams, Chinn, and Suleiman 2014; Wold. etal. 2016). It is also likely that breaking news comes first on Twitter than traditional media channels, providing an excellent medium to access instantaneous information from official as well as unofficial sources(Wold. etal. 2016).

However, on social media, it is often observed that majority groups dominate the communication, and as a consequence, the minority groups do not feel as free to express(Lingiardi etal. 2020). Past research has explored gender-related differences in offline as well as online communication spaces, such as Twitter (Holmberg and Hellsten 2015; Garcia, Weber, and Garimella 2014), Stack Overflow (May, Wachs, and Hannák 2019), Wikipedia (Wagner etal. 2015), and YouTube (Thelwall and Foster 2021). For instance, Messias et al. (2017) showed that, in general, white men are more likely to be followed on social media and acquire higher positions in rankings and ratings on Twitter; only after the top 14% most followed users, there are higher fractions of women than of men. This can also be translated to specific topics (e.g. Sports, Politics, News, Technology, Industry, and Art)(Nilizadeh etal. 2016; Manzano and Sánchez-Giménez 2019) where women tend to be less influential as well as hold lower-ranking/percentage of top influential females. Thus, thoughts and narratives of males linger more on Twitter and are more likely to get traction probably due to this participation imbalance and gender biases and constructs(Garcia, Weber, and Garimella 2014).

Many social, contextual, and developmental(Chaplin 2015) factors play a role in perpetuating biases and stereotyping in society that is manifested on social media. Some fields and topics such as sports, STEM (Science, Technology, Engineering, and Mathematics) and politics are more likely to be associated with men(Wagner etal. 2015) facilitating their participation in conversations. Thus, in these contexts, women can feel more hesitant to talk about such topics and take up certain professions due to gender norms and constructs. Skewed narratives and perspectives get reinforced as a consequence and get embedded in subconscious behaviour. To move against this reinforcement, we need to be aware of gender differences to promote more diversity, inclusivity and fairness in online platforms.We are interested in analyzing the gender gap in online communication. We focus our analysis on a topic that is widely spoken worldwide: Soccer (or Football). In fact, the last World Cup was watched by more than 4 billion people out of a world population of almost 8 billion (Chanis Online accessed on 2-Aug-2022), indicating the worldwide importance of the topic.

We collected soccer-related Tweets for three months for our study using Twitter API. Our dataset includes 7 million English tweets from 2 million unique users and 2.6 million Portuguese tweets from 0.5 million unique users. Our analysis aims to explore gender-based communication differences in both languages. The reason behind choosing Portuguese as a second language is that in Brazil and Portugal, individuals are exposed to soccer from an early age, potentially encouraging people to love the sport and participate more actively than in cultures where soccer is not as popular.

Moreover, some papers found active participation from women in topics related to soccer. (Yoon etal. 2014) showed that the main motivational factor among females to use Twitter is entertainment for leisure time, and not information and fanship. (Clavio, Walsh, and Coyle 2013) found that female fans were more likely to contribute to sports team feeds on Twitter than men. They rated a variety of informational and commercial functions relating to in-game updates, game results, individual player news, contests, giveaways, and ticket discounts, more than men. This study also found that female fans were significantly more active on social media using smartphones while being at the stadium and more likely to respond to Tweets sent out by team Twitter feeds than male fans. In this paper, we investigate these ideas further to understand how women as a minority might have a space to be active in a male-dominant topic such as soccer and how this impacts communication behaviour.

Women make up almost half of the population and 38% of the soccer fan base (Lange 2020), and yet we only observed 18% of tweets were from women. Across our dataset, women were found to be much less likely to talk about soccer compared to men and exhibited more retweeting behaviour. This can be an evidence of a starker gender gap. Through our study, we investigate various ways in which men and women behave differently in terms of communicating, such as type of content, frequency of content, and behaviour in a community. We conduct three levels of analysis: text-level, network-level, and event-level, to gain insights and answer the research questions. We observe that the Portuguese network has lower hom*ophily than the English network, showing that more interactions between men and women might be related to the popularity of a topic, making it more inclusive. However, in both languages, we observe that women show intense emotions as men and that the language of the tweets plays a small role in gender differences related to emotions and sentiments.

Here, we wish to drive communication about this topic to encourage a healthier and equitable online environment. We look into the factors that make a space comfortable for everyone to express freely without fearing judgment. UNESCO launched a #ChangeTheGame campaign to encourage gender equality in sports as they believe that football can act as an effective medium to reduce the gender gap and empower women worldwide(Schischlik 2023). Analyzing the evolution of football-related communication can help us to create a safe space online through interventions and encourage participation from everyone. This might ignite a change from the grassroots level and amplify women’s involvement in the game across all levels, right from players, fans, sport experts, coaches, and positions in governance organizations. While recognizing privacy, ethical and computational challenges, our current analysis is limited to binary genders, and we acknowledge the importance of expanding to a broader gender spectrum in future studies.

Our paper is divided into four parts. In the next section, we present our dataset, followed by our findings. The paper is concluded with a discussion including future directions.

Soccer-Tweet Dataset

We collected data from Twitter over a span of three months (March 7 to June 5, 2022) using the following pattern: “#Soccer OR football OR futebol OR #soccer OR #football OR #futebol OR #futball OR #futbol OR #fifa OR #CBF OR #brasileirao OR #CopaIntelbrasDoBrasil OR #Libertadores OR #Sudamericana OR #CampeonatoDoBrasileiro OR #CopaAmerica OR #ConfedCup OR #UEFACup OR #Supercup OR #WorldCup OR #LaLiga OR #CopadelRey OR #CoppaItalia OR #CoupedeFrance OR #premierleague OR #championsleague OR #FACUP OR Confederations Cup OR #confederationscup OR #ConfedCup OR Copa del Rey OR EFL cup”. We carry out the gender detection using Genderize(Genderize.io) and Namepedia(namepedia.org). After the gender detection, we discard the tweets whose users could not be assigned a gender (male or female). After these steps, in English (also refereed by en), we were left with 6,957,598 tweets: 5,767,122 tweets by males and 1,190,476 tweets by females. Tweets by females had a composition of 17.11% as compared to 82.89% by males. In the time frame of 3 months, we had 421,996 unique female users and 1,556,818 unique male users. Similarly, in Portuguese (also refereed by pt), we had 2,572,247 tweets, where 2,011,286 tweets were from males, and 560,961 (21.81%) tweets were from females. We had 365,045 unique male users and 148,539 (28.92%) unique female users. For both languages, men have a majority in terms of users and posting tweets overall and per week (shown in Figure1).

Gender differences in online communication: A case study of Soccer (1)

Retweeting Behavior

We also separated retweets and replies to examine whether they have a distinct pattern from the original posted tweets. We observed in English that 68.91% of the female tweets were retweets compared to 60.25% for males, and in Portuguese, 59.07% of the female tweets were retweets compared to 55.08% for males. For the replies, we observe in English, 15.87%(23.21%) of female(male) tweets, and in Portuguese, 20.67%(37.87%) of female(male) tweets. Thus, women tend to retweet more than men, and the opposite is observed for replies.

Our Findings

We divide our analysis into two parts - analyzing the content of the tweets and analyzing the networks generated from the interaction between users through retweets and replies.

Text Analysis

We first analyze the tweet text using emojis, hashtags, sentiments, emotions, and other linguistic attributes.

Emoji Usage Analysis

A picture is worth a thousand words - The emojis we use today in online communication are more than a tiny picture; they symbolize emotions. Emoji usage has increased rampantly so much that the Oxford Dictionary declared the emoji face with tears of joy (emojifacewithtearsofjoy) as the word of the year for 2015 (Oxford UniversityPress 2015). The scientific research community has also shown interest in analyzing emoji usage patterns in communication(Robertson, Magdy, and Goldwater 2021a, b; Robertson etal. 2021; Chen etal. 2018).

In this work, we use the Emot (Shah and Rohilla 2022) library to extract emojis from the tweet text. After extraction, we analyze how emoji usage varies by the two genders. In the English female tweet data, around 32% tweets contained at least one emoji as compared to 29.20% of tweets by males. However, if we remove the retweets, quoted tweets, and replies, and only look at the original tweets, we find that 29.85% of tweets by females had emojis, while only 21.42% of original male tweets had emojis. The results indicate that emoji usage tends to be more prevalent among females in both languages (Table 1). This finding also aligns with the study by Chen et al.(Chen etal. 2018), where they report that female users include emojis in a larger proportion of text messages as compared to men. Figure shows top-20 emojis used by both genders and the percentage of tweets that used that emoji. This gives us insight into the type of emojis that are being used by males and females in the context of soccer communication.

Gender differences in online communication: A case study of Soccer (2)

gendertweets % with emoji average # of emojis
(including 0s)
Enfemaleoriginal29.85%2.42 (0.72)
all31.91%2.35 (0.75)
maleoriginal21.43%2.44 (0.52)
all26.28%2.32 (0.39)
Ptfemaleoriginal22.59%3.11 (0.70)
all22.43%3.56 (0.80)
maleoriginal17.79%2.75 (0.48)
all16.19%2.78 (0.45)

Emojis are used further in sentiment and emotion analysis. We keep the emojis in their raw format while computing the sentiment using VADER API(Hutto and Gilbert 2014) as the model is capable of handling emojis. For emotion analysis, the emojis were converted to their equivalent emotion text so that they could be captured by the NRC-Emotion Intensity Lexicon API (Mohammad 2018).

Hashtag Usage

Hashtags are a pivotal part of communication on Twitter and help people follow the content related to a topic easily using its specific hashtags (Twitter). In our dataset, from 7.53 million tweets, there are almost 3 million hashtags, and 20.77% of the tweets had at least one hashtag. 23.12% of the tweets with at least one hashtag were from females, as compared to 20.28% of tweets from males. This shows marginally higher hashtag usage behaviour by women as compared to men. However, both women and men are more likely to use one hashtag per tweet than several ones. The average number of hashtags across genders is similar; for females(males), it is 0.8587(0.8691) in English and 0.1686(0.1518) in Portuguese.

Sentiment Analysis

Sentiment
GenderPosNegNeu
Enfemale0.550.210.24
male0.550.210.24
Ptfemale0.100.050.85
male0.090.050.86

Gender differences in online communication: A case study of Soccer (3)

Analyzing sentiments (Saxena, Reddy, and Saxena 2022b) from content helps us to understand public opinions and preferences at a large scale for different topics, such as events, policies, and laws(Stracqualursi and Agati 2022; Cheng etal. 2021; Matalon etal. 2021; Saxena, Reddy, and Saxena 2022a). We use VADER (Valence Aware Dictionary for Sentiment Reasoning) (Hutto and Gilbert 2014) to study how the tweets by men and women differ sentimentally. VADER is widely used for social network data due to its ability to handle texts with emojis, emoticons, extra punctuations, negations, use of contractions, randomized capitalization, use of degree modifiers, sentiment-laden slang words, and acronyms. VADER is a lexicon and rule-based tool to get sentiment scores across categories ‘Positive’ (compound score \geq 0.05), ‘Negative’ (compound score \leq -0.05) and ‘Neutral’ (compound score between -0.05 and 0.05)(Hutto and Gilbert 2014). Therefore, each tweet was classified into one of the three categories: positive, negative and neutral.

Overall, the average sentiment scores of both genders are very similar (Table 2), but when we disaggregate the tweets per week, we observe that men post more positive tweets and women are more neutral (Figure 3); for negative tweets, we did not observe any dominance.We also further study how different genders show sentiments for events, and the results are interesting, as shown in Figure 11 in Appendix A. We observe that whenever an event happens, females show more intense sentiments than males. We identified some events that were on days with intense sentiments for English as an example. The events corresponding to intense positive emotion include - Real Madrid won the La Liga for the 35th time and Champion League final won by Real Madrid, and for negative sentiments, they include - Algeria Cameroon Match Controversy, Dwane Haskins, Americal Pittsburgh Steelers quarterback died in an accident and Ukraine lost making way for Wales in the world cup. Similar results were observed for Portuguese.

Gender differences in online communication: A case study of Soccer (4)
Gender differences in online communication: A case study of Soccer (5)

Emotion Analysis

Figure4 shows the word clouds of the most frequently used words to convey emotions. Females tend to use the words “Heart” and “Love” more frequently than males. Men tend to use the words “Like” and “Good”, which display less intense emotions. These choices of words by males indicate a proclivity towards conveying emotions that are less intense and perhaps more casual in nature.

In general, the most predominant emotions for both males and females for the topic of soccer were Joy (32%absentpercent32\approx 32\%≈ 32 % tweets), Anticipation (23%absentpercent23\approx 23\%≈ 23 % tweets), and Trust (22%absentpercent22\approx 22\%≈ 22 % tweets). To understand the gender differences, we compared the distribution of emotions per week using a bootstrapping with 80% subsamples simulated for 1000 iterations (Figure 5). These results show that the intensity of joy and anticipation is higher for females as compared to males, and the opposite happens to the emotions of fear, anger, and disgust. For the emotions of surprise and sadness, there is no clear predominance.

Overall, the analyses of the word clouds and emotions suggest that women and men tend to express themselves differently. They tend to use different words and expressions, and the intensity with which these words are expressed tends to be different, ultimately shaping how emotions are conveyed in communication.

Content Analysis

From the analyses of sentiments, we noticed that irony and injury were not well detected and that the results from the emotion of joy and the sentiment of positivity were not aligned. Therefore, we decided to analyze the content in relation to possible toxicity and abusive behaviour online. We used Google’s Perspective API (Hutto and Gilbert 2014) to estimate the levels of severe toxicity, insult, attack on the author, threat, identity attack, and sexually explicit behaviour. We used the same bootstrapping technique of 80% subsampling and 1000 repetitions, and plotted the difference between the average values of each metric for each gender (Figure 6). The values tend to be small (around 0.13), and we just analyze the gender differences. We observe that men tend to express more severe toxicity, insult and attack on author, and women tend to express more threat, identity attack and sexually explicit behaviour. We can then conclude that there is still a need to reduce abusive communication. Results for several other metrics are discussed in Appendix B.

Gender differences in online communication: A case study of Soccer (6)

Network-Analysis

We build weighted cumulative networks (from the starting week till the last week) from the communication between the users (nodes), where the link weight is the number of times two users interacted with each other by replying or retweeting information. There are three types of networks for each language based on (i) retweets, (ii) replies, and (iii) both retweets and replies (mentioned as the combined network based on all communications). The size of the networks are - in English, there are 148,184 females (18.94%) and 634,366 males, and in Portuguese, there are 50,354 females (25.85%) and 144,441 males. In total, we have 4,308,009 interactions in English, where 1,348,557 are replies and 2,959,452 are retweets, and in Portuguese, we have 2,117,105 interactions, where 986,288 are replies and 1,130,817 are retweets. We will analyze these networks to understand how information is shared for each language based on their structure and flow.

Communication Patterns

We first observed that the network from the communication across languages tends to be different (NPD>0.90𝑁𝑃𝐷0.90NPD>0.90italic_N italic_P italic_D > 0.90, results in Table3) by computing the Network Portrait Divergence (NPD) between networks that estimates how different two networks are in relation to their structure (0 to 1 corresponds to being similar to different)(Bagrow and Bollt 2019). The NPD method compares networks using the network portrait that is a (l,k)𝑙𝑘(l,k)( italic_l , italic_k ) size array containing the number of nodes who have k𝑘kitalic_k nodes at a distance l𝑙litalic_l. Therefore, it provides an information-theoretic interpretation for compared networks based on the structures of all scales and is well-generalized for weighted networks. We observe that the networks tend to be different in structure, except when we compare the two networks built solely from the retweets, then NPD tends to be close to 0.4. This means that the replies play a major role in making these interactions different. Next, we use NPD to compare the structures from both networks, but now we compare how women and men interact. Overall, women and men tend to emerge in different networks (values of NPD closer to 1). However, we see that for replies in English, the network structure tends to be similar for women and men, indicating that the networks built from Portuguese tweets are more different in structure than the ones from English.

Tweets Comparison NPD
allacross languages0.90 (0.007)
English, across gender0.97 (0.01)
Portuguese, across gender0.99 (0.001)
retweetsacross languages0.46 (0.04)
English, across gender0.39 (0.01)
Portuguese, across gender0.41 (0.02)
repliesacross languages0.91 (0.007)
English, across gender0.28 (0.02)
Portuguese, across gender0.94 (0.001)

Gender differences in online communication: A case study of Soccer (7)
Gender differences in online communication: A case study of Soccer (8)
Gender differences in online communication: A case study of Soccer (9)
Gender differences in online communication: A case study of Soccer (10)

Network Characteristics

Let us investigate further the characteristics of these networks. We compare the average values of multiple metrics computed from the networks built from all the tweets in English and Portuguese (Figure 7).From the cumulative evolution over 12 weeks, we observe that the number of nodes and edges is higher for English, and the density and average clustering tend to be smaller. We argue that people tweeting in Portuguese might belong to more concentrated regions than in English (worldwide spoken language). Therefore, soccer might come as a more popular and well-connected local phenomenon in Brazil and Portugal, but for English tweets, we look at several communities together, indicated by the higher connection between small groups (closer average clustering values) but not to the overall network (much smaller density).

Even though English tweets tend to come from people from different regions, the fraction of women does not scale at the same rate as Portuguese. A much smaller community coming from Portuguese tweets shows a much larger fraction of women. In general, women tend to be a minority in sports-related topics(Farrell etal. 2019; Thelwall and Stuart 2019), but the popularity of soccer in smaller communities (Portuguese ones) might encourage women to participate more in the topic. This higher participation translates into more active participation across genders. Women from Portuguese tweets tend to interact more with men than from English (lower values of gender hom*ophily for Portuguese). To make sure that these differences are not a group size effect, we compute the assortativity of both networks, and we still observe the same patterns. The assortativity is computed using the adjusted nominal assortativity, i.e., defined explicitly for networks having asymmetric mixing of nodes from different groups (genders in our case)(Karimi and Oliveira 2023). Thus, the increase in the fraction of women in the Portuguese network over weeks impacts the hom*ophily values, and as there are more women, the hom*ophily and assortativity are further reduced.

Next, we look at the self-loops in the networks (Figure 7.H). We see that English tend to have more self-loops than Portuguese, as is expected by their group size. However, when we disaggregate these self-loops by gender (Figure 8.A), we see that around 10% of the edges coming from men’s interactions are self-loops which is half of what we see from women’s interactions. This indicates that women in our sample tend to retweet and reply more to their own content than men. By disaggregating other network metrics based on gender, we also see that women have denser networks, but men tend to have higher average clustering and higher core numbers. This indicates that women tend to be more connected in general, but men tend to act more as hubs and acquire more central positions.

The core number of a node shows its influential power, and the nodes at the very internal core (corresponding to the highest core number) are the top influential nodes (Kitsak etal. 2010). The highest number of cores for English is 33, and for Portuguese is 28. At these cores, we analyze the networks in Portuguese (169 people) and English (214 people), and we found a women ratio of 14.79% and 12.61%, respectively. Even though women have a higher ratio of most active users in Portuguese than in English, the difference between the women ratio of the overall network (fw,pt=25.85%subscript𝑓𝑤𝑝𝑡percent25.85f_{w,pt}=25.85\%italic_f start_POSTSUBSCRIPT italic_w , italic_p italic_t end_POSTSUBSCRIPT = 25.85 %, fw,en=18.94%subscript𝑓𝑤𝑒𝑛percent18.94f_{w,en}=18.94\%italic_f start_POSTSUBSCRIPT italic_w , italic_e italic_n end_POSTSUBSCRIPT = 18.94 %) and the core is decreased by 11.06% and 6.33%, respectively, indicating that women are not as influential as men in both languages.

Meso-scale Structures

We further study the organization of nodes in the network based on meso-scale structures, including core-periphery and communities (Saxena and Iyengar 2016).In Figure 9, we plot the evolution of metrics over cores in both networks based on all kinds of tweets; English has a higher core number than Portuguese. We identify the core-periphery using the k-shell decomposition method (Kitsak etal. 2010). We first study how hom*ophily varies as we move from periphery to core to understand the connection pattern of different genders at different influential levels.Many previous works have shown that women do not acquire top positions in the network due to the glass-ceiling effect (Avin etal. 2015; Stoica, Riederer, and Chaintreau 2018). Figure9 shows the fraction of females in different layers as we move from periphery to core in both the networks, and we observe that when a core of Portuguese reaches a similar fraction of women inside it compared to English, they have a much lower hom*ophily than in English. We observe that hom*ophily increases as we move from periphery to core, equal to 15 for both languages, a fraction of women decreases, and density continues to be similar across languages. However, the average clustering for Portuguese increases much more than for English, probably due to the fact that the people in the network belong to a specific region.

Next, we study the embedding of females and males in different communities; the results are shown in Figure10. Communities are extracted using the Leiden Community detection method (Traag, Waltman, and VanEck 2019). Analyzing the network of English tweets reveals that hom*ophily increases as the group size decreases, and the biggest community reflects the smallest hom*ophily value. This also happens in the Portuguese network, but the variance is much higher. English network is more hom*ophilic (as shown in Figure7), and this trend extends to the identified communities. The increase of hom*ophily does not correlate with the values of the fraction of women and average clustering, but for Portuguese, the density increases as the group size decreases and the hom*ophily increases.

Discussion

In this paper, we study Twitter communication about soccer (or football) in English and Portuguese from a gender perspective, as it is highly popular and the most played sport in the world. While existing works (Nilizadeh etal. 2016; Manzano and Sánchez-Giménez 2019; Amarasekara and Grant 2019) emphasize gender gaps in communication within specialized fields like STEM and politics, discussing such topics requires specific training and understanding. In contrast, our focus is on topics that encourage people to interact regardless of their expertise, making them more accessible. Examining the extent to which women and men communicate differently about Soccer on Twitter is not only intriguing but also holds significant importance in understanding gender-centric communication dynamics. Our findings aim to expand the understanding of how the popularity and familiarity of a topic influence communication, and what are the gaps where interventions can be tackled.

We collected soccer-related Twitter data for three months (March 7 to June 6, 2022) and identified the genders of users using Genderize and Namepedia. The scope of our study is confined to binary gender, and tweets without the specified gender of their authors were excluded from further analysis. The final dataset contains 7 million tweets in English (from 2 million users) and 2.5 million tweets in Portuguese (from 0.5 million users). We then analyzed the content of the tweets and interaction patterns from a gender perspective to observe how a male-dominated environment impacts communication. We found that women tend to be a minority in participation and representation, but they express more sentiments and emotions.

There are several similarities between women and men, such as the similar use of the top 20 most frequently used emojis and the usage of hashtags in the tweets. However, women use more emojis than men, though this does not translate to the usage of hashtags. Men post more positive tweets, and women post more neutral tweets. Gender differences in negative tweets did not show a consistent result. This might be due to the diversity of negative emotions that can be represented by the class “negative”. We see that when analysing abusive behaviour, women and men can display “negative” tweets; that is hard to be captured in an aggregated manner.

Then, we analysed further the emotions detected from the tweets. Women tend to express higher levels of joy and anticipation than men in both languages, and disgust, anger, and fear tend to be more gender-neutral, with slightly higher levels for males. Women display intense sentiments and emotions for any event as compared to men. Besides these, tweets posted by men have higher toxicity, profanity, insulting text, abusive language, and attacking writing style. However, women’s tweets are more incoherent, sexually explicit, and have higher identity attacks. Interestingly, we did not find any notable emotional difference based on gender between English and Portuguese, and emotional responses appear to be unaffected by the overall network structure.

We further constructed retweet, reply, and the combined networks based on Portuguese and English tweets. The reply and combined networks are statistically different (computed using NPD) across genders, highlighting a significant difference in communication across genders based on replies. However, the retweet networks across languages and genders are not very different (NPD score is around 0.4). The Portuguese network exhibits a higher proportion of women in interactions and lower hom*ophily compared to the English network. This difference could be attributed to a regional focus in Portuguese tweets, mainly from Brazil and Portugal, where soccer is highly popular. Women’s networks in Portuguese show denser structures, with higher average clustering and lower assortativity than those of men.

Women are less influential, and the ratio of women reduces from the periphery to the core of the network; the fraction is decreased by 11.06% and 6.33% in Portuguese and English, respectively. Small communities in both languages are more hom*ophilic. In Portuguese, bigger communities are less hom*ophilic, and hom*ophily reduces with the community size, given that the fraction of women is maintained in most of the communities. However, the difference in hom*ophily across communities is not very significant in English.

Our observations indicate a strong correlation between a topic’s popularity and lower hom*ophily in the network, suggesting a safer space for communication between different genders. This, however, does not influence how individuals express their emotions. The significant communication gap highlighted in this study is eye-opening, and it emphasizes the need to focus on bridging it and creating online safe spaces for discussions on such topics. While we do not propose intervention methods in this work, we believe that the observed outcomes can serve as a foundation. In the future, we would like to study further the correlation of popularity and ease of topics with the hom*ophily of their network and writing style. If similar results are observed, then one can use awareness and education programs to bridge these gaps. Similar techniques could be used for topics that are very domain-specific, such as STEM.

We acknowledge some limitations of this work. First, Twitter’s user base may not represent a comprehensive picture of the world as it is predominantly used by specific groups, such as white, male, and middle-upper class people, which can impact the generalizability of results. Nevertheless, studying the use of online social media, such as Twitter, can help us to make these platforms more inclusive. Secondly, some methodologies, such as the API used for sentiment analysis, still needs improvement and refinement. In our case, many tweets that were related to offences and ironies were classified as positive. To address this, we conducted analyses from various perspectives that complementarily can collaborate to enhance results’ robustness. For instances, results related to emotions were manually checked, and tend to be more accurate

Our study goes beyond to explore gender variations in communication patterns, highlighting potential misperceptions of free speech on social media. Despite soccer being more popular among Portuguese speakers, we observe similarities (such as influential position for women) as will as variations (such as difference in network structures) in communication patterns across languages. In the future, we plan to further investigate the explainability of patterns extracted from the networks and their communities. Additionally, we plan to compare these patterns with a with a sport or activity predominantly followed by women, such as ballet. We posit that different sports may influence people to communicate in distinct manners.

References

  • Amarasekara and Grant (2019)Amarasekara, I.; and Grant, W.J. 2019.Exploring the YouTube science communication gender gap: A sentiment analysis.Public Understanding of Science, 28(1): 68–84.
  • Avin etal. (2015)Avin, C.; Keller, B.; Lotker, Z.; Mathieu, C.; Peleg, D.; and Pignolet, Y.-A. 2015.hom*ophily and the glass ceiling effect in social networks.In Proceedings of the 2015 conference on innovations in theoretical computer science, 41–50.
  • Bagrow and Bollt (2019)Bagrow, J.P.; and Bollt, E.M. 2019.An information-theoretic, all-scales approach to comparing networks.Applied Network Science, 4(1): 1–15.
  • Chanis (Online accessed on 2-Aug-2022)Chanis, A. Online accessed on 2-Aug-2022.17 Reasons why soccer is the most popular sport in the world.https://mastersoccermind.com/17-reasons-why-soccer-is-the-most-popular-sport-in-the-world/.
  • Chaplin (2015)Chaplin, T. 2015.Gender and Emotion Expression: A Developmental Contextual Perspective.Emotion review : journal of the International Society for Research on Emotion, 7: 14–21.
  • Chen etal. (2018)Chen, Z.; Lu, X.; Ai, W.; Li, H.; Mei, Q.; and Liu, X. 2018.Through a Gender Lens.In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18. ACM Press.
  • Cheng etal. (2021)Cheng, I.; Heyl, J.; Lad, N.; Facini, G.; and Grout, Z. 2021.Evaluation of Twitter data for an emerging crisis: an application to the first wave of COVID-19 in the UK.Scientific Reports, 11(1): 1–13.
  • Clavio, Walsh, and Coyle (2013)Clavio, G.; Walsh, P.; and Coyle, P. 2013.The effects of gender on perceptions of team Twitter feeds.Global Sport Business Journal, 1(1).
  • Farrell etal. (2019)Farrell, T.; Fernandez, M.; Novotny, J.; and Alani, H. 2019.Exploring misogyny across the manosphere in reddit.In Proceedings of the 10th ACM conference on web science, 87–96.
  • Garcia, Weber, and Garimella (2014)Garcia, D.; Weber, I.; and Garimella, V. 2014.Gender Asymmetries in Reality and Fiction: The Bechdel Test of Social Media.Proceedings of the International AAAI Conference on Web and Social Media, 8(1): 131–140.
  • Genderize.io ([Online accessed on 2-Aug-2022])Genderize.io. [Online accessed on 2-Aug-2022].https://genderize.io/.
  • Holmberg and Hellsten (2015)Holmberg, K.; and Hellsten, I. 2015.Gender differences in the climate change communication on Twitter.Internet research.
  • Hutto and Gilbert (2014)Hutto, C.; and Gilbert, E. 2014.VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.Proceedings of the International AAAI Conference on Web and Social Media, 8(1): 216–225.
  • Karimi and Oliveira (2023)Karimi, F.; and Oliveira, M. 2023.On the inadequacy of nominal assortativity for assessing hom*ophily in networks.Scientific Reports, 13(1): 21053.
  • Kitsak etal. (2010)Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; and Makse, H.A. 2010.Identification of influential spreaders in complex networks.Nature physics, 6(11): 888–893.
  • Lange (2020)Lange, D. 2020.Soccer fans by gender 2019.
  • Lingiardi etal. (2020)Lingiardi, V.; Carone, N.; Semeraro, G.; Musto, C.; D’Amico, M.; and Brena, S. 2020.Mapping Twitter hate speech towards social and sexual minorities: a lexicon-based approach to semantic content analysis.Behaviour & Information Technology, 39(7): 711–721.
  • Manzano and Sánchez-Giménez (2019)Manzano, C.; and Sánchez-Giménez, J.A. 2019.Women, gender and think tanks: political influence network in Twitter 2018.
  • Matalon etal. (2021)Matalon, Y.; Magdaci, O.; Almozlino, A.; and Yamin, D. 2021.Using sentiment analysis to predict opinion inversion in Tweets of political communication.Scientific Reports, 11.
  • May, Wachs, and Hannák (2019)May, A.; Wachs, J.; and Hannák, A. 2019.Gender differences in participation and reward on Stack Overflow.Empirical Software Engineering, 24(4): 1997–2019.
  • Messias, Vikatos, and Benevenuto (2017)Messias, J.; Vikatos, P.; and Benevenuto, F. 2017.White, man, and highly followed: Gender and race inequalities in Twitter.In Proceedings of the international conference on web intelligence, 266–274.
  • Mohammad (2018)Mohammad, S.M. 2018.Word Affect Intensities.In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018). Miyazaki, Japan.
  • namepedia.org ([Online accessed on 2020-11-11])namepedia.org. [Online accessed on 2020-11-11].http://www.namepedia.org/.
  • Nilizadeh etal. (2016)Nilizadeh, S.; Groggel, A.; Lista, P.; Das, S.; Ahn, Y.-Y.; Kapadia, A.; and Rojas, F. 2016.Twitter’s glass ceiling: The effect of perceived gender on online visibility.In Proceedings of the International AAAI Conference on Web and Social Media, volume10, 289–298.
  • Oxford UniversityPress (2015)Oxford UniversityPress, A. 2015.Word of the Year 2015.https://languages.oup.com/word-of-the-year/2015/.[Online accessed on 2-Aug-2022].
  • Robertson etal. (2021)Robertson, A.; Liza, F.F.; Nguyen, D.; McGillivray, B.; and Hale, S.A. 2021.Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018.
  • Robertson, Magdy, and Goldwater (2021a)Robertson, A.; Magdy, W.; and Goldwater, S. 2021a.Black or White but never neutral: How readers perceive identity from yellow or skin-toned emoji.
  • Robertson, Magdy, and Goldwater (2021b)Robertson, A.; Magdy, W.; and Goldwater, S. 2021b.Identity Signals in Emoji Do not Influence Perception of Factual Truth on Twitter.
  • Saxena and Iyengar (2016)Saxena, A.; and Iyengar, S. 2016.Evolving models for meso-scale structures.In 2016 8th international conference on communication systems and networks (COMSNETS), 1–8. IEEE.
  • Saxena, Reddy, and Saxena (2022a)Saxena, A.; Reddy, H.; and Saxena, P. 2022a.Introduction to sentiment analysis covering basics, tools, evaluation metrics, challenges, and applications.Principles of social networking: the new horizon and emerging challenges, 249–277.
  • Saxena, Reddy, and Saxena (2022b)Saxena, A.; Reddy, H.; and Saxena, P. 2022b.Recent developments in sentiment analysis on social networks: Techniques, datasets, and open issues.Principles of Social Networking: The New Horizon and Emerging Challenges, 279–306.
  • Schischlik (2023)Schischlik, A. 2023.Women and football: #ChangeTheGame - towards gender equality in sports.Online accessed on 14-01-2023.
  • Shah and Rohilla (2022)Shah, N.; and Rohilla, S. 2022.Emot API.https://pypi.org/project/emot/.[Online; accessed 2-Aug-2022].
  • Stoica, Riederer, and Chaintreau (2018)Stoica, A.-A.; Riederer, C.; and Chaintreau, A. 2018.Algorithmic glass ceiling in social networks: The effects of social recommendations on network diversity.In Proceedings of the 2018 World Wide Web Conference, 923–932.
  • Stracqualursi and Agati (2022)Stracqualursi, L.; and Agati, P. 2022.Tweet topics and sentiments relating to distance learning among Italian Twitter users.Scientific Reports, 12.
  • Thelwall and Foster (2021)Thelwall, M.; and Foster, D. 2021.Male or female gender‐polarized YouTube videos are less viewed.Journal of the Association for Information Science and Technology, 72.
  • Thelwall and Stuart (2019)Thelwall, M.; and Stuart, E. 2019.She’s Reddit: A source of statistically significant gendered interest information?Information processing & management, 56(4): 1543–1558.
  • Traag, Waltman, and VanEck (2019)Traag, V.A.; Waltman, L.; and VanEck, N.J. 2019.From Louvain to Leiden: guaranteeing well-connected communities.Scientific reports, 9(1): 5233.
  • Twitter ([Online accessed on 18-July-2022])Twitter. [Online accessed on 18-July-2022].How to use hashtags.https://help.twitter.com/en/using-twitter/how-to-use-hashtags.
  • Wagner etal. (2015)Wagner, C.; Garcia, D.; Jadidi, M.; and Strohmaier, M. 2015.It’s a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia.
  • Williams, Chinn, and Suleiman (2014)Williams, J.; Chinn, S.J.; and Suleiman, J. 2014.The value of Twitter for sports fans.Journal of Direct, Data and Digital Marketing Practice, 16(1): 36–50.
  • Wold. etal. (2016)Wold., H.M.; Vikre., L.; Gulla., J.A.; Özlem Özgöbek.; and Su., X. 2016.Twitter Topic Modeling for Breaking News Detection.In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,, 211–218. INSTICC, SciTePress.ISBN 978-989-758-186-1.
  • Yoon etal. (2014)Yoon, J.; Smith, C.; HyungKim, A.C.; Clavio, G.; Witkemper, C.; and Pedersen, P.M. 2014.Gender Effects on Sport Twitter Consumption: Differences in Motivations and Constraints.Journal of Multidisciplinary Research (1947-2900), 6(3).

Appendix A

In Figure 11, we show sentiment analysis for English data categorized based on gender. We observed that the intensity of sentiments was different for both genders, and women displayed higher intensity. We also highlight events corresponding to peaks of positive and negative sentiments, and the same events can also be connected with emotion analysis.

Gender differences in online communication: A case study of Soccer (11)
Gender differences in online communication: A case study of Soccer (12)
Gender differences in online communication: A case study of Soccer (13)

Appendix B

In Figure 12, we show the plots for all parameters computed using Google Perspective API to understand the different dimensions of the content of the tweets. Toxicity is manifested in tweets that are impolite, disrespectful, or unreasonable, with the potential to drive individuals away from a discussion. Severe toxicity means a highly hateful, aggressive, or disrespectful comment, designed to strongly discourage a user from participating in a discussion or expressing their perspective. Profanity shows the use of swear words, curse words, or other obscene or profane language. As we discussed in the manuscript, men are more toxic and also have high severe toxicity. Besides toxicity, men write in such a way that it seems more attacking on the author and commenter of the tweet. Men also post more insulting content; an insulting, inflammatory, or negative comment directed towards an individual or a group of people is considered for computing the insult. Men use more abusive and swear words in both languages, displayed in profanity.

A threat involves expressing an intention to cause harm, injury, or violence against an individual or a group. There is no significant difference in using threatening language. Identity attack refers to making negative or hateful comments that specifically target someone based on their identity. Women are also very incoherent in the posted content as compared to males. The incoherent text means that it is difficult to understand and nonsensical. If the tweet text is irrelevant, then it is considered spam. Women’s tweets are identified more as spam. Sexually Explicit refers to content that contains references to sexual acts, body parts, or other lewd material, and women raise such points more in online communication. The difference is not significant for inflammatory and obscene text. Inflammatory text intends to provoke or inflame. Obscene refers to the use of obscene or vulgar language, including cursing.

The way men and women tweet and present their opinion and emotions tend to be consistent in both the languages.

Gender differences in online communication: A case study of Soccer (14)
Gender differences in online communication: A case study of Soccer (2024)
Top Articles
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 5962

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.