Categories
iSchool Portfolio

Social Media Text Analysis: Reddit

Based on a student data project that used Python machine learning libraries to analyze text from the social news and discussion site Reddit, we developed a scholarly research paper to submit to the HICSS 2022 conference. Focused on two popular advice subreddits, we created manual coding data to identify user demographics disclosures, described whether any patterns could be observed when comparing user metrics and algorithmic scores, and speculated on the role community moderation played in interpreting study data. I was first author on the published paper, co-author of the original student project, performed much of the data labeling and quantitative analysis, and presented our findings at the virtual conference. I offered guidance and feedback for other authors.

Because basic statistical analysis failed to suggest any interesting patterns with metrics or text analysis algorithms, we chose to frame the study as an exploration of behavior and identity disclosure in semi-anonymous environments. To that end, we needed to expand our original attempt to automatically extract demographics based on text pattern matching logic and instead manually reviewed each post to identify age and gender, if explicitly included. This allowed us to draw inferences between our forums of interest in terms of disclosure rates and community moderation and expectations. Furthermore, we could comment on extraction methods in general, suggesting principles to avoid assumptions and maximize accurate coverage of demographic information that future researchers could use.

In addition to scalability issues with demographics disclosure, we realized that we should have more carefully considered which subreddits to include in our dataset–ideally those with similar community guidelines–which may have resulted in more compelling analysis using the same algorithms and metrics data we originally collected. This may have allowed us to go beyond an exploration of demographics and disclosure. This was a scholarly work, so its effect, if any, may not be seen for some time.

Because of this project, we learned some of the limitations and opportunities for using Python to explore online communities. Personally, it was an excellent opportunity to participate in the scholarly publication process and prepare a presentation of the project.

Categories
iSchool Portfolio

Social Media Survey Research Plan

For my graduate-level survey research methods course, I devised a questionnaire and research proposal to investigate attitudes about personal disclosure and privacy on different social media platforms, especially among gender diverse people. I hypothesized that individuals with marginalized identities, including transgender people, would be more likely to either disguise or hide aspects of their identity or share less personal information on social media sites where those identities and/or perspectives put them at greater risk for harm, such as abuse, threats, or career problems.

As a solo project, I was responsible for developing the questionnaire, performing background research, and writing the proposal. Throughout the project, I consulted with my classmates, instructor, and external contacts to test and improve the survey instrument and overall research direction. To explore my research question, I knew I needed to collect survey data on two key areas: social media use and demographics. The specific question and answer sets for each were informed by background research, in particular several previous scholarly studies of social media and identity, as well as general knowledge of survey design best practices. 

There were three sections of social media questions, one each for Twitter, Instagram, and Facebook, each with nearly identical question and answer sets (UI copy and functionality differences between the sites reflected) so they could be directly compared. Each section began by asking respondents how often they used a given social media platform and only those who selected “I use it once a week or more often” were shown the following question set. The questions for each platform were divided into two pages to reduce participant fatigue. The first page asked participants about the type of information they share about themselves and their interests and what kind of privacy settings they use. The second, shorter page asked users to rate their agreement with five statements on a 5-point agreement scale. These statements were particularly influenced by previous research about online privacy concerns and community.

Screenshot of agreement questions from Facebook section of survey

For demographics, I wanted to include more aspects of identity than just my area of focus (gender) without seeming too invasive, as well as deliberating excluding answers that might identify respondents. Age was asked as a set of ranges; disability status was a simple yes/no; most questions allowed for multiple selections and every item could be skipped or marked “Prefer not to answer.” “Other” with a text field was an option for many questions, but I aimed to design the questionnaire so most respondents could easily answer without writing anything.

According to feedback from peers and survey testers, I mostly succeeded in my goal to write an easy-to-answer questionnaire that gave respondents options they felt described themselves and their point of view. Data collected with this instrument would have been readily analyzed and compared with (hopefully) minimal manual coding.

An important challenge I faced when designing this questionnaire involved writing questions and answers that were complete, accurate, AND readily answered without being time-consuming. Some of my questions asked users to recall information they might not have in mind, such as profile fields they completed. In an effort to eliminate open-ended questions entirely, I made a list of topics that people might post about, which started out very long (40 or so items) and ended up with 22 named topics, 14 of which combined two or more related topics. These labels might not have been a great fit for many respondents’ mental model of their interests, and I heard feedback from survey testers that they were confused about which boxes to check for more specific interests they had in mind. Additional user testing prior to launching the study would be ideal. Additionally, had I moved forward with this proposal, a key concern would be sampling–gender identity is not a screening question, nor can a gender diverse population be effectively quantified for the sake of random sampling techniques, so careful, targeted recruitment would be necessary to attract respondents with marginalized identities.

Screenshot of a spreadsheet with work-in-progress topic labels next to screenshot of question as it appeared in the survey

Research proposal and questionnaire (PDF) on Google Drive

Throughout this process, it was a pleasure to reconnect with survey research and quantitative analysis skills I first honed working in market research once upon a time. I learned quite a bit about the existing body of research about user behaviors and attitudes toward social media. We sometimes think of this as an emerging field, but relevant studies go back at least 20 years to online forums, newsgroups, and early social media like Friendster. It reinforced my desire to incorporate intersectionality and inclusion principles in my work as much as possible, which is why I did not opt to limit the study to participants of a particular gender identity and instead collect demographics that may reflect multiple oppressions to better contextualize my (proposed) findings.

Categories
iSchool Portfolio

Cultural Timeline

Open in new tab

Asked to create a timeline on a humanities-related topic of our choosing for a Fall 2020 Digital Humanities course, I selected gender nonconformity in pop culture, a somewhat broad and tricky topic as the terminology and understanding of gender identity outside the binary is an evolving one, especially as it concerns culture and media. The resulting interactive object includes a selection of some–but certainly not all–instances of how these identities have been represented since the 1800s or so, along with brief analysis and commentary on those representations. Featured in the iSchool’s Student Showcase.

A complete reference list follows.

Media List

Bayer, J. (2019, April 3). Prince “Symbol” [Photograph]. Flickr. https://www.flickr.com/photos/23401011@N03/40565825163

Classic_Movie_Gals. (2008, June 12). Marlene Dietrich in Seven Sinners (1940) [Movie still]. Flickr. https://www.flickr.com/photos/27534776@N07/2574399770

Drümmkopf. (2007, July 2). Left hand of darkness [Photograph]. Flickr. https://www.flickr.com/photos/30453880@N04/4171543007

Hipp, M. (2012, May 20). Angry Inch – Hedwig & the Angry Inch [Video]. YouTube. https://www.youtube.com/watch?v=qWI6E8gdBzk

JasonOnEarth. (2007, August 1). Star Trek: TNG – “The Outcast” – ‘I Am Tired of Lies’ Scene [Video]. YouTube. https://www.youtube.com/watch?v=mMqGlSjAbwA

JonSnow. (2011, June 21). Pink Flamingos, live homicide [Video]. YouTube. https://www.youtube.com/watch?v=8bHKJt8beCI

Loose Women. (2017, June 30). Eddie Izzard on Why It Was Important for Him to Come Out | Loose Women [Video]. YouTube. https://www.youtube.com/watch?v=k2rYR0n5zTQ

Movieclips Classic Trailers. (2019, August 2). Boys Don’t Cry (1999) Trailer #1 [Video]. YouTube. https://www.youtube.com/watch?v=Ar9wGSd7KVQ

Ramirez, S. [@therealsararamirez]. (2020, August 27). New profile pic. In me is the capacity to be Girlish boy Boyish girl Boyish boy Girlish girl All Neither [Photograph]. Instagram. https://www.instagram.com/p/CEZak3AHwjG/

RuPaul’s Drag Race. (2020, October 8). Every Miss Congeniality’s Entrance (Compilation) [Video]. YouTube. https://www.youtube.com/watch?v=CalCAwLOJYo

Saturday Night Live. (2013, October 14). Pat at the Office [Video]. YouTube. https://www.youtube.com/watch?v=TYkjXMpKBBQ

STAT CHILE. (2016, January 8). Sylvester – You Make Me Feel (Mighty Real) (1978) HD [Video]. YouTube. https://www.youtube.com/watch?v=Ifr13Upytb4

Steven Universe. (2017, November 22). Stevonnie Run Into Trouble At A Dance Party | Alone Together | Cartoon Network [Video]. YouTube. https://www.youtube.com/watch?v=sEELKp3jLd4

Unknown. (n.d.) Mathilde “Missy” de Morny [Photograph]. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Mathilde_%E2%80%9CMissy%E2%80%9D_de_Morny.jpg

References

Amin, K. (2013). Ghosting transgender historicity in Colette’s The Pure and the Impure. L’Esprit Créateur, 53(1), 114–130. https://doi.org/10.1353/esp.2013.0012

Anders, C. J. (2019, February 25). Exploring the genius of Ursula Le Guin’s Hainish Cycle. Tor.com. https://www.tor.com/2019/02/25/unlocking-the-full-brilliance-of-ursula-le-guins-hainish-cycle/

Anderson, R. (2013). Fabulous: Sylvester James, black queer afrofuturism and the black fantastic. Dancecult, 5(2). https://doi.org/10.12801/1947-5403.2013.05.02.15

Arroyo, B. (2014). Sexualizing the transgendered body in Hedwig and the Angry Inch and Boys Don’t Cry. Textual Overtures, 2(1). https://doi.org/10.4000/ejas.14078

Avilez, G. (2019). Uncertain freedom: RuPaul, Sylvester, and black queer contingency. The Black Scholar, 49(2), 50–64. https://doi.org/10.1080/00064246.2019.1581978

Brodeur, N. (2020, January 31). When your signature ‘SNL’ character isn’t funny anymore: Julia Sweeney revisits Pat. Seattle Times. https://www.seattletimes.com/entertainment/when-your-signature-snl-character-isnt-funny-anymore-julia-sweeney-revisits-pat/

Butler, J. (1999). Gender trouble: Feminism and the subversion of identity (2nd ed.). Routledge.

Chambers, B. (2018, September 10). How The Left Hand of Darkness changed everything. LiteraryHub. https://lithub.com/how-the-left-hand-of-darkness-changed-everything/

Conway, J. J. (2012, June 29). Dress-down Friday: Mathilde de Morny. Strange Flowers. https://strangeflowers.wordpress.com/2012/06/29/dress-down-friday-mathilde-de-morny/

Cooper, B. (2002). Boys Don’t Cry and female masculinity: Reclaiming a life & dismantling the politics of normative heterosexuality. Critical Studies in Media Communication, 19(1), 44–63. https://doi.org/10.1080/07393180216552

Dean, L. (2020, August 10). Queer characters find power in “She-Ra” and “Steven Universe.” Bitch Media. https://www.bitchmedia.org/article/history-of-queer-representation-in-cartoons-she-ra-korra

Dry, J. (2019, December 12). As ‘Boys Don’t Cry’ joins National Film Registry, Kimberly Peirce addresses its complicated history. IndieWire. https://www.indiewire.com/2019/12/kimberly-peirce-interview-boys-dont-cry-transgender-1202196536/

Dunn, E. (2016). Steven Universe, fusion magic, and the queer cartoon carnivalesque. Gender Forum, 56. http://genderforum.org/transgender-and-the-media-issue-56-2016/

Ellsworth, M. P. (2016, April 22). Words of liberation: Prince’s lyrics and queer identity. MTV News. http://www.mtv.com/news/2871846/prince-lyrics-queer-identity/

Feder, S. (Director). (2020, June 19). Disclosure: Trans lives on screen [Documentary]. Netflix. https://www.netflix.com/title/81284247

Fitzgerald, T., & Marquez, L. (2020). Legendary children: The first decade of RuPaul’s Drag Race and the last century of queer life. Penguin Books.

Florido, H., Mitroff, K., Sugar, R. (Writers), &   Bae, K., Kim, S., Michalka, E., Jones-Quartey, I. (Directors). (2015, January 15). Alone Together (season 1, episode 37) [TV series episode]. In R. Sugar, W. Moreland, & C. Beaton (Executive Producers), Steven Universe. Cartoon Network Studios.

Gammel, I. (2012). Lacing up the gloves: Women, boxing and modernity. Cultural and Social History, 9(3), 369–390. https://doi.org/10.2752/147800412X13347542916620

Gudelunas, D. (2016). Culture jamming (and tucking): RuPaul’s Drag Race and unconventional reality. Queer Studies in Media & Popular Culture, 1(2), 231–249. https://doi.org/10.1386/qsmpc.1.2.231_1

Hallam, L. (2010). Monster queen: The transgressive body of Divine in Pink Flamingos. Bright Lights Film Journal. https://brightlightsfilm.com/monster-queen-the-transgressive-body-of-divine-in-pink-flamingos/

Hamel, J. (2018, May 11). The Pansy Craze: When gay nightlife in Los Angeles really kicked off. KCRW. https://www.kcrw.com/culture/shows/curious-coast/the-pansy-craze-when-gay-nightlife-in-los-angeles-really-kicked-off

Hawkins, S. (2017). The sun, the moon and stars: Prince Rogers Nelson, 1958–2016. Popular Music and Society, 40(1), 124–128. https://doi.org/10.1080/03007766.2016.1245482

Kelso, T. (2015). Still trapped in the U.S. media’s closet: Representations of gender-variant, pre-adolescent children. Journal of Homosexuality, 62(8), 1058–1097. https://doi.org/10.1080/00918369.2015.1021634

kydd, E. (1998). Star Trek: Insiders and “Outcasts.” Jump Cut: A Review of Contemporary Media, 42, 39–44. https://www.ejumpcut.org/archive/onlinessays/JC42folder/StarTrekGender.html

Le Guin, U. K. (2010). The left hand of darkness. Ace Books.

Patterson, G., & Spencer, L. G. (2017). What’s so funny about a snowman in a tiara? Exploring gender identity and gender nonconformity in children’s animated films. Queer Studies in Media & Popular Culture, 2(1), 73–93. https://doi.org/10.1386/qsmpc.2.1.73_1

Pidduck, J. (2001). The Boys Don’t Cry debate: Risk and queer spectatorship. Screen, 42(1), 97–102. https://doi.org/10.1093/screen/42.1.97

Poole, R. J. (2018). “Rise like two angels in the night:” Sexualized violence against queers in American film. European Journal of American Studies, 13(4). https://doi.org/10.4000/ejas.14078

Prince. (1984, June 25). I Would Die 4 U [Song]. On Purple Rain [Album]. Warner Bros. Records.

Richards, J. (2016, October 19). Do we need to time warp again? Queer identity and the problems with the Rocky Horror Picture Show. Bitch Media. https://www.bitchmedia.org/article/do-we-need-time-warp-again/queer-identity-and-problems-rocky-horror-picture-show

Schmidt, T. (2010). “Being cool about it”: Performing gender with Eddie Izzard. Gender Forum, 29, 20–30. http://genderforum.org/private-i-public-eye-issue-29-2010/

Schoellkopf, C. (2017, June 13). Eddie Izzard reflects on coming out as transgender, why Caitlyn Jenner is a role model. The Hollywood Reporter. https://www.hollywoodreporter.com/bookmark/eddie-izzard-reflects-coming-as-transgender-why-caitlyn-jenner-is-a-role-model-1012926

Song, L., & Tan, C. K. K. (2020). The final frontier: Imagining queer futurity in Star Trek. Continuum, 34(4), 577–589. https://doi.org/10.1080/10304312.2020.1750564

Taylor, J. (Writer) & Scheerer, R. (Director). (1992, March 16). The Outcast (season 5, episode 17) [TV series episode]. In M. Piller, G. Roddenberry, & R. Berman (Executive Producers), Star Trek: The Next Generation. Paramount Television.

The Matrix is a “trans metaphor”, Lilly Wachowski says. (2020, August 7). BBC News. https://www.bbc.com/news/newsbeat-53692435

Whiteneir, K. T. (2019). Dig if you will the picture: Prince’s subversion of hegemonic black masculinity, and the fallacy of racial transcendence. Howard Journal of Communications, 30(2), 129–143. https://doi.org/10.1080/10646175.2018.1536566

McEnany, A., Mason, T., Wachowski, L., Adler, J., Berns, A., Hernandez, T., Mattis, L., & Sweeney, J. (Executive Producers). (2019-present). Work in progress [TV series]. Circle of Confusion; Showtime Networks.

Young, E. (2019). They/them/their: A guide to nonbinary and genderqueer identities. Jessica Kingsley Publishers.

Categories
Portfolio Yahoo

Tumblr in Web Search Experiment

Explored ways to feature relevant, engaging Tumblr content in Yahoo web search results.

Screenshot of Tumblr search results experience for query "Charleston" sometime after the horrific church shooting
Tumblr in Yahoo Search results (no longer live)

Ask
After Yahoo acquired Tumblr, Search leadership asked me to find a way to feature Tumblr content in web search results.

Process
I started out with several things to consider:

  • Understanding the type of content on Tumblr
  • Determining what content, if any, could map to real web search user needs
  • Figuring out what metadata we could extract from Tumblr posts and whether it was enough to work well in our content management platform
  • Learning as much as we could from what little data the Tumblr team could share with us

Because I was unable to discover much evidence of existing Yahoo search-to-Tumblr content behavior in our logs, and the nature of Tumblr’s content is freewheeling and relatively unstructured, we had to experiment.

The first test featured content from specific Tumblr users (celebrities, online personalities, organizations–entities with discrete matching queries) in a simple image carousel. Limitations of this approach: only image-type posts could be displayed, so blogs with text posts, links, etc. would appear with limited results or none at all, despite frequent updating; we could only trigger on keywords that had a clear match to a single blog (e.g., Beyonce, ZooBorns). As a result, coverage was low, and leadership tasked us with significantly expanding the experience.

“[Emily] took on a very demanding team that wanted to create a new experience for users with Tumblr content. She patiently worked with the team and in many instances stepped in to help move the project forward. Without her it would have taken much longer to launch the experience on Search.”

Product Manager, Search

To accomplish this, I needed to rely on automatic triggering methods that offered far less control over what content appeared in search results. Despite concerns about relevance and quality, we launched a test for a small percentage of search traffic. The initial test had to be taken offline within days because, although the backend team took steps to remove content flagged as “adult,” pornographic results (and worse) slipped through.

Search leadership was determined, however, and resources were provided to dramatically improve the indexing for quality and cleanliness. The backend team also added logic for when to return content at all, based on timeliness and other factors. A visual designer was brought in to collaborate a unique template for Tumblr that accounted for the variable types of content and included more Tumblr branding (color, logos). The UX and content improvements launched as a test for a small percentage of search traffic, and although metrics weren’t impressive, it didn’t cause major problems, and the feature launched for all desktop web traffic.

“Emily did an outstanding job on the Tumblr [search experience] presentation for the Tumblr team. She has built a [search experience] that puts a stake in the ground until Science can develop more precise triggering.”

Product Marketing Manager, Search

Seeking to experiment further in hopes of improving and better understanding its performance, I took the initiative to categorize queries that triggered the Tumblr module and identify categories that might be well-served with Tumblr content. I used existing keyword lists roughly mapping to a dozen or so categories and set up a test bucket version of the module with only these categories with logging for each. I also wanted to see if other factors affected performance, including where the module appeared on the page (“slotting”) and how consistently it appeared (whether to ignore backend display logic). I tracked and compared my experiment’s performance to the primary module’s on a weekly basis, using that data to make small tweaks to each category along the way.

Result
The great Tumblr in search experiment ended after about a year and a half, when leadership decided the investment was no longer justifiable. Despite the effort’s ultimate failure, I was recognized for my contribution and creativity.

Key categories in my final experiment did show some lift in performance: food, books, holidays, fictional characters, TV series, and movie series.