presentation | Emily N Cannon

Based on a student data project that used Python machine learning libraries to analyze text from the social news and discussion site Reddit, we developed a scholarly research paper to submit to the HICSS 2022 conference. Focused on two popular advice subreddits, we created manual coding data to identify user demographics disclosures, described whether any patterns could be observed when comparing user metrics and algorithmic scores, and speculated on the role community moderation played in interpreting study data. I was first author on the published paper, co-author of the original student project, performed much of the data labeling and quantitative analysis, and presented our findings at the virtual conference. I offered guidance and feedback for other authors.

Because basic statistical analysis failed to suggest any interesting patterns with metrics or text analysis algorithms, we chose to frame the study as an exploration of behavior and identity disclosure in semi-anonymous environments. To that end, we needed to expand our original attempt to automatically extract demographics based on text pattern matching logic and instead manually reviewed each post to identify age and gender, if explicitly included. This allowed us to draw inferences between our forums of interest in terms of disclosure rates and community moderation and expectations. Furthermore, we could comment on extraction methods in general, suggesting principles to avoid assumptions and maximize accurate coverage of demographic information that future researchers could use.

In addition to scalability issues with demographics disclosure, we realized that we should have more carefully considered which subreddits to include in our dataset–ideally those with similar community guidelines–which may have resulted in more compelling analysis using the same algorithms and metrics data we originally collected. This may have allowed us to go beyond an exploration of demographics and disclosure. This was a scholarly work, so its effect, if any, may not be seen for some time.

“Don’t Downvote A$$$$holes”: An Exploration of Reddit’s Advice Communities in the Proceedings of HICSS
Presentation slides and notes on Google Drive

Because of this project, we learned some of the limitations and opportunities for using Python to explore online communities. Personally, it was an excellent opportunity to participate in the scholarly publication process and prepare a presentation of the project.

For my graduate-level information architecture course, I teamed up remotely with two classmates to complete a semester-long design proposal with several distinct components and deliverables, including content and business strategy, content inventory, user research plan and initial card sort task results, personas, and mock-ups or wireframes. We chose to focus on the website of the family-run California native plant nursery Las Pilitas, a treasure trove of information about gardening and nature that appears among the top Google search results for queries relevant to its somewhat niche interests, but suffers from navigation and wayfinding issues. Based on the publicly available SEO data we could find, we posited that a relatively high bounce rate reflected this poor navigation, meaning users landed on a particular page, such as a page about a popular plant family, but did not explore image galleries, information about specific varieties of the plant, advice about landscaping with these plants, and so on.

As students, my teammates and I balanced responsibilities in order to maximize learning, as the project was an opportunity to practice a variety of skills and gain experience with new tools. I created all the wireframes in Balsamiq; developed our content inventory process in Airtable; designed the report, personas, and and slides; performed competitive and background research; drew the final version of the site map diagram; and did light project management/tracking in a simple spreadsheet.

Screenshot of a portion of our project tracking spreadsheet

Based on content inventory and competitive research, we knew we needed to improve the site’s overall organization for more meaningful breadcrumbs to give visitors entering via web search a sense of where they’re at and what else they might explore. We expected this would also support effective “related” links on deep nodes like plant detail pages. Additionally, we wanted to explore modernizing the global navigation bar with a fat menu design. We tested a number of possible user flows on the existing site and noted pain points to accomplishing common user goals like making a purchase or finding plant information, which informed several smaller decisions in terms of buttons, tooltips, search interface, and media experience.

Final report (PDF) on Google Drive
Mock client presentation (PPTX) on Google Drive

Since this was a student project with no client contact or budget for in-depth user research, we were limited in our scope. The wireframes, labels, and personas are all effectively a first iteration and would undoubtedly evolve over the course of doing real client-contracted work. In particular, I’d like to be able to see site analytics and search traffic data to more effectively identify important entry points and stress cases for visitors and customers. Additionally, the content library is enormous, far too many documents to cover for our project, but a thorough accounting of the site’s content and how it’s internally linked would be an important starting point for a site redesign.

User flow from entry point (plant group page) to a plant detail (product) page to shopping experience comparing existing experience with proposed redesign wireframes

Translating ideas and inspiration to a coherent mock-up requires creativity and a clear sense of your users and the product. We realized that, for as much as we could do on spec, our work would only really be actionable after we could address the limitations. Despite that, the team benefitted from the collaborative design process overall and enjoyed learning new tools, like Balsamiq, Lucidchart, and Airtable, to make design ideas tangible.