Digital platforms have changed the way humans interact with each other and the world around them (Samples 2023). Facebook and Instagram have taken over social lives, X (formerly Twitter) and Google have changed how people get information, and Venmo and PayPal have transformed personal finance, just to name a few. To use any such platform, a user must agree to several contracts known as Terms of Use (TOUs), which are notorious for convoluted, difficult to comprehend language (Samples 2023, Samples et al. 2024). If a user is unable to read a contract, it begs the question of whether these contracts are fair. To understand trends in contract complexity and reading difficulty for consumers, this work analyzes the linguistic patterns in TOUs longitudinally.
To accomplish this objective, the dataset was compiled by scraping 323 TOUs of 21 platforms dating from 1999-2024 using the Internet Archive. The full corpus consists of approximately 3 million words. The TOUs are gathered from platforms ranging from social media, finance, dating, gaming, and business. After initial data gathering and compilation, the text files were submitted to several rounds of regular expression scripts, to remove non utf-8 characters and prepare the data for further analysis and NLP (Natural Language Processing). Next, the data was submitted to part-of-speech tagging and analysis with the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC), generating over 100 indices of linguistic complexity for each TOU. At present, the project focuses on eight variables associated with noun phrase and verbal complexity, in addition to results for traditional readability formulas (Flesh and FRE, obtained with R packages quanteda and the tidyverse, Benoit et al. 2018, Wickam et al. 2019).To view how readability has changed longitudinally, this study needed a tool to dynamically view how each platform changed over time. To aid the researcher, this tool required flexibility to compare different platforms, focus on varying sub-periods, and choose from varying graph styles.
Using R-Shiny, this project created an app that allows the user to select any number of platforms and any number of metrics to view (Chang et al. 2024). As platforms and metrics are chosen, they are added to the chart space, as well as a table describing the chosen metrics and platforms. Each metric has a description and an example listed, and each platform has a description and the number of users listed. The default view shows each metric on its own step chart, with each line on the chart representing a separate platform, which are differentiated by color. The range of years depicted in the chart will autoscale to fit the data selected. When the user hovers over any data point, it provides information regarding that point. Each graph can be saved as a picture, zoomed in and out, and panned due to the Plotly package in R. The app user has customizable viewing options. First, the user can choose to not overlay the platforms on the graph, which in turn separates each chart by platform. Second, the user can choose to change the year span to home into a certain time frame. Third, the user can choose their graph view from a step chart, linear model, and loess model. The structure of this app allows for flexibility for the researcher to view the data in multiple ways allowing for a thorough longitudinal trend analysis.
Since implementing this tool, the study has begun revealing interesting findings. Our initial results using this tool show that, in general, the size of TOUs are increasing across all platforms in terms of word count. Although linguistically, there is no clear pattern of change over time when considering the dataset collectively, trends reveal that metrics for platforms from a common corporation (e.g. Meta platforms) tend to converge. This intuitively suggests that the same corporate management leads to a similar TOU. In addition to this, certain platforms reveal increasing verbal complexity longitudinally. Financial tech companies (e.g. Venmo and PayPal) tend to be more linguistically complex and longer than most other platforms. This may in large part be due to financial jargon that is added to the TOU which adds more clauses and a greater lexical diversity.
While this tool will continue to provide meaningful insight regarding TOUs, this project is also exciting for the generalizability of the app to be used for other longitudinal studies. The structure of the code can be easily altered for any longitudinal dataset that has some list of metrics with a grouping variable. Examples of use cases include economic metrics of varying countries over time, sports statistics of different teams/players over time, education metrics of different colleges over time, and more.