Introduction

The circumstances for the pharmacy profession are certainly unprecedented, even excluding the COVID crisis. The last two decades have brought about significant changes to the pharmacy profession, some notable changes include mass adoption of the PharmD degree, the progressing shift to service-based payment from product-based reimbursement, and a significant increase in pharmacy schools accompanied by an increase in the number of pharmacy students from 35,885 in 2001 to 65,101 in 2019 in the USA (1). Conversely, pharmacy schools saw a large decrease in applicants, as of 2018-2019, 50,842 total applications were received (1). This is a staggering decrease from 2011, which had 106,815 applications, a 52.4% decrease. What factors are influencing this phenomenon where there is a large increase in pharmacy schools, but a significant decrease in applicants? There are likely multiple factors influencing the current state of affairs, arguably most influenced by the poor job market. One factor that may be overlooked is prospective pharmacy students looking to social media for input on the pharmacy career path. 

Social media plays an influential role in how students make career decisions (2,3,4). Online social networking allows for quick, seamless communication between peers expressing opinions. Reddit and Student Doctor Network, among others, allow for anonymous posting, encouraging users to express opinions without fear of workplace repercussions. This is particularly beneficial if their comments are negative of their employer. However, negative posts, particularly about the saturated pharmacy job market, may impact a student’s decision to consider a career in the pharmacy profession. It is known that students can make a career decision haphazardly (5), without truly considering and reflecting on a career path. Could negative forum posts influence students to avoid choosing pharmacy as a profession?

What is not known is why students do not choose pharmacy as a career path. What else is not known is social media’s influence on prospective pharmacy student career decisions. To the authors’ knowledge, no study has reviewed online social media regarding the pharmacy profession nor social media’s influence on pharmacy careers. This study set out to review the subreddit r/pharmacy, r/pharmacy school, and r/pre-pharmacy to determine if there is an identifiable word sentiment indicating positivity or negativity towards the profession. The authors speculate that through Natural Language Processing (NLP) methodologies, this study will be able to determine whether the majority of top posts in pharmacy subreddit are either mainly positive or negative. 

This study’s hypothesis is the majority of online subreddit posts will contain an overall negative word sentiment.

Methods

Sentiment analysis

Sentiment analysis is a study of NLP that use systematic automated techniques to break down language to mine the emotions, views, opinions, and attitudes. In this study, the focus on the emotional component of sentiment analysis, and parsed each sentence into words to perform a sentiment scoring.

Data extraction from Reddit

Reddit is one of the largest anonymous forums on the internet. Reddit has three pharmacy-related sub-forums, known as a subreddit, and they are r/pre-pharmacy, r/pharmacy, and r/pharmacy school. Each of the sub-forum represents a Reddit population in the pharmacy community, though they are not necessarily mutually exclusive. 

Reddit provides an authorized application programming interface, known as API, that can provide online access for programming purposes. Anyone is able to comment, view, and data scrape any official and authorized posts through the API extracted by Python. 

The steps are shown in the figure X.1. The process starts with creating an authorized instance on a reddit account. The library package, Praw, was used to access the subreddit forum and scrape data. This study extracted top posts from June 1st 2019 to June 1st 2020. The AFINN sentiment library was used, developed by Finn Arup Nielsen. AFINN library word list assigns word values from -5 to +5. The steps of the process are shown in figure X.2. The process breaks down the sentence from comments and topics into single words. After the application of AFINN, we received a score for each word from the subreddit topic and comments. 

The Data Process Flowchart Figure X.1

Exclusion

All neutral or “zero-scored”words were excluded from the sentiment analysis because they can decrease sensitivity and dilute overall sentiment. This is based on the chart that is shown in Figure X.3. Zero-score words represented over 50% of the data sample,  and included mistyped and stop words. The Stanford NLP group defines a stop word as “some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely.”(6)

The statistics of this study consist of exclusion criteria in data selection. The analysis has a ratio of positive and negative scores, and basic statistics. The study also has sensitivity analysis that determines the data impact on the ratio of the positive and negative. 

Results

The study findings are presented in Table X.1 . The sentiment analysis ratio of positive to negative words is 1.380, demonstrating a positive skew and depicted in Figure X.3.  This study reviewed 299 top subreddit posts and examined 21,850 total words using the AFINN library,  resulting in 12,669 positive words and 9181 negative words used. Word frequency rank shown in the tornado graph in Figure X.4 is done to show the words that impact the positive and negative ratio.

Figure X.4 displays the 25 top scored words in the post titles and comments, with their accumulated score. The higher the score, the more impact it gives to the score ratio. The word “like” appears the most among the positively scored words, and carries an AFINN score of +2. This is demonstrated in Figure X3, as the Positive Word Score +2 has the highest entries, while the negative word scores distribute more evenly. In Figure X5, the graph shows the most common words with their counts in posts title and comments. The word “pay” is the most frequently used negative word and the explicative “shit” was the highest negative scored word.  Both figures show that there are more positive words than negative words with the AFINN scoring system. 

Word Score Characteristics Data Table X.1

Average Word Score

0.266

Median Word Score

1

Standard Deviation

2.102

Total Word Count

21850

Negative Words

9181

Positive Words

12669

Positive to Negative Ratio

1.380

Number of posts

299

Word Score Distribution Excluding Zero Score Words Figure X.3

 

Ratio of positive and negative scores Figure X.3

 

Graph of Top 25 Words with Most Common Negative and Positive Score Impact Figure X.4

Graph of top 25 most frequently used words on the pharmacy related subreddits  Figure X.5

Discussion

One of the authors noted anecdotes of academic career advisors discouraging students from pharmacy career paths. These advisors described their reasoning based on their online research of forums and difficulty in finding a pharmacy job. This study reasoned that if the majority of online forum information from anonymous users is negative about the pharmacy profession, then this may sway many prospective students and counselors to avoid the pharmacy career path.

This study uses the AFINN sentiment analysis method to analyze anonymous users’ words sentiments in the pharmacy-related subreddits. AFINN is a Python 3 library that offers a function to assign word sentiment scores. The goal was to identify the emotions of the current pharmacy-related professionals.  From June 2019 to June 2020, encompassing the COVID-19 crisis, the pharmacy-related subreddit top posts appear to have an overall positive sentiment analysis, which contradicts the study’s hypothesis. More positive than negative words were used in the three subreddits, at a ratio of 1.38. 

The word “like” is the most frequently used positive word and skews the results to a more positive bias due to the word being used as an adjective, preposition or a conjunction, which may not infer a positive relationship, and in fact may be a neutral sentiment. One could speculate the word sentiment analysis of top subreddit posts would result in a neutral (zero) or negative sentiment if removing “like.” 

The analysis comes with multiple limitations. The study only reviewed the top 299 postings from June 1st 2019 to June 1st 2020. This study did not review posts that did not make the “top” Reddit category. Thus posts and comments that did not receive attention did not influence the analysis. Another limitation of the methodology is that AFINN lacks the ability to understand context of word sentiment. For example, sarcasm is not analyzed appropriately, as the sentence, “I love my job” appears positive, but could be stated in sarcasm. The analysis also does not include deleted posts from the subreddit. Each subreddit has moderators who may remove commenter posts or comments that infringe upon forum rules. As many of the subreddit rules (7) prohibit negative behavior, it is possible that many negative comments or posts have been removed, resulting in a higher percentage of positive words. Another limitation of the analysis is that Reddit users may not be pharmacists, pharmacy technicians or pharmacy students since any member of the public can post in the subreddit. 

Future studies could analyze past subreddit data and compare to numbers of applications year by year. More intensive NLP studies could be conducted for the future to ascertain pharmacy’s social media perspective on the profession. Future studies could review other subreddits, Student Doctor Network, Facebook groups or Twitter. Higher levels of language processing analysis could be instituted as well. 

Conclusion

This study hypothesized the pharmacy-related subreddits to contain overall negative word sentiment. However, this study discovered the subreddit pharmacy word sentiment contains an overall positive sentiment. The ratio of positive to negative word sentiment was 1.38 to 1. “Like” was the most common positive word used, and “pay” was the most common negative word used. Additional studies should be conducted to better understand social media’s influence on the profession of pharmacy.


References:
  1. Student Applications, Enrollments and Degrees Conferred Reports. (n.d.). Retrieved July 04, 2020, from https://www.aacp.org/research/institutional-research/student-applications-enrollments-and-degrees-conferred
  2. Tan, T., Zhang, Y., & Kankanhalli, A. (2014). How Online Social Networks Affect my Job Choice Intention: an Empirical Approach. PACIS.
  3. Hoag, A., Grant, A.E., & Carpenter, S. (2017). Impact of Media on Major Choice: Survey of Communication Undergraduates. NACADA Journal, 37, 5-14.
  4. Support me but don’t tell me what to do (Rep.). (2014). Retrieved May 4, 2020, from Insead website: https://www.insead.edu/sites/default/files/assets/dept/centres/emi/docs/millennials-part-3-support-me-but-dont-tell-me-what-to-do.pdf
  5. Willis, Sarah & Shann, P. & Hassell, Karen. (2009). Pharmacy career deciding: Making choice a "good fit". Journal of health organization and management. 23. 85-102. 10.1108/14777260910942579. 
  6. Dropping common terms: stop words. Retrieved July 11, 2020 from https://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html
  7. Reddit Content Policy. Retrieved July 09, 2020, from https://www.redditinc.com/policies/content-policy

Authors:
Alex Barker, PharmD
Kevin Lan, PharmD Candidate

Reviewed:

Jessica Czechowski, PharmD
Laura Dee, PharmD
Nicole Salata, PharmD
Ingrid Vilimelis Piulats, PhD Candidate




Do Pharmacists Speak Well of Pharmacy? A Word Sentiment Analysis of Pharmacy Subreddits
Tagged on: