journal article Open Access Nov 07, 2024

Insights from an Experiment Crowdsourcing Data from Thousands of US Amazon Users: The importance of transparency, money, and data use

Abstract
Data generated by users on digital platforms are a crucial resource for advocates and researchers interested in uncovering digital inequities, auditing algorithms, and understanding human behavior. Yet data access is often restricted. How can researchers both effectively and ethically collect user data? This paper shares an innovative approach to crowdsourcing user data to collect otherwise inaccessible Amazon purchase histories, spanning 5 years, from more than 5,000 U.S. users. We developed a data collection tool that prioritizes participant consent and includes an experimental study design. The design allows us to study multiple important aspects of privacy perception and user data sharing behavior, including how socio-demographics, monetary incentives and transparency can impact share rates. Experiment results (N=6,325) reveal both monetary incentives and transparency can significantly increase data sharing. Age, race, education, and gender also played a role, where female and less-educated participants were more likely to share. Our study design enables a unique empirical evaluation of the 'privacy paradox', where users claim to value their privacy more than they do in practice. We set up both real and hypothetical data sharing scenarios and find measurable similarities and differences in share rates across these contexts. For example, increasing monetary incentives had a 6 times higher impact on share rates in real scenarios. In addition, we study participants' opinions on how data should be used by various third parties, again finding that gender, age, education, and race have a significant impact. Notably, the majority of participants disapproved of government agencies using purchase data yet the majority approved of use by researchers. Overall, our findings highlight the critical role that transparency, incentive design, and user demographics play in ethical data collection practices, and provide guidance for future researchers seeking to crowdsource user generated data.
Topics

No keywords indexed for this article. Browse by subject →

References
59
[1]
2012. Part 5 Personal Data Protection Act. https://sso.agc.gov.sg/Act/PDPA2012
[2]
2016. Art 15 General Data Protection Regulation. https://gdpr-info.eu/
[3]
2018. Art. 19 Lei Geral de Proteção de Dados Pessoais (LGPD). https://www.planalto.gov.br/ccivil_03/_Ato2015- 2018/2018/Lei/L13709.htm
[4]
2018. Section 1798.110 California Consumer Privacy Act. https://oag.ca.gov/privacy/ccpa
[7]
What Is Privacy Worth?

Alessandro Acquisti, Leslie K. John, George Loewenstein

The Journal of Legal Studies 10.1086/671754
[9]
Susan Athey Christian Catalini and Catherine Tucker. 2017. The digital privacy paradox: Small money small costs small talk. Technical Report. National Bureau of Economic Research. 10.3386/w23488
[12]
Alex Berke Dan Calacci and Robert Mahari. 2022. Comment in response to proposed FTC Trade Regulation Rule on Commercial Surveillance and Data Security: Commercial Surveillance ANPR R111004. https://www.regulations.gov/ comment/FTC-2022-0053--1201
[15]
Laura Boeschoten, Adriënne Mendrik, Emiel van der Veen, Jeroen Vloothuis, Haili Hu, Roos Voorvaart, and Daniel L Oberski. 2022. Privacy-preserving local analysis of digital trace data: A proof-of-concept. Patterns 3, 3 (2022).
[17]
CRITICAL QUESTIONS FOR BIG DATA

danah boyd, Kate Crawford

Information, Communication & Society 10.1080/1369118x.2012.678878
[22]
Ignacio N Cofone. 2022. Privacy Standing. University of Illinois Law Review (2022), 1367.
[23]
Ignacio N Cofone and Adriana Z Robertson. 2017. Privacy harms. Hastings Law Journal 69 (2017), 1039.
[28]
Huo Jingnan. 2023. Twitter's new data access rules will make social media research harder. National Public Radio (2023). https://www.npr.org/2023/02/09/1155543369/twitters-new-data-access-rules-will-make-social-mediaresearch- harder
[31]
Josephine Lukito, J Nathan Matias, and Sarah Gilbert. 2023. Enabling Independent Research Without Unleashing Ethics Disasters. Tech Policy Press. https://techpolicy.press/enabling-independent-research-without-unleashing-ethicsdisasters
[32]
Emma Lurie. 2023. Comparing Platform Research API Requirements. Tech Policy Press (2023). https://techpolicy.press/ comparing-platform-research-api-requirements/
[33]
Moreno Mancosu and Federico Vegetti. 2020. What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media Society 6, 3 (2020), 2056305120940703. https://doi.org/10.1177/ 2056305120940703
[34]
Nikita Mazurov. 2022. IWant You Back: Getting My Personal Data From AmazonWasWeeks of Confusion and Tedium. https://theintercept.com/2022/03/27/amazon-personal-data-request-dark-pattern/ Accessed Nov. 2023.
[35]
Timothy Morey, Theodore Forbath, and Allison Schoop. 2015. Customer data: Designing for transparency and trust. Harvard Business Review 93, 5 (2015), 96--105.
[36]
Helen Nissenbaum. 2020. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press.
[39]
Nico Pfiffner and Thomas N Friemel. 2023. Leveraging Data Donations for Communication Research: Exploring Drivers Behind the Willingness to Donate. Communication Methods and Measures (2023), 1--23.
[40]
Barbara Prainsack. 2019. Data donation: How to resist the iLeviathan. The ethics of medical data donation (2019), 9--22.
[41]
Afsaneh Razi, Ashwaq AlSoubai, Seunghyun Kim, Nurun Naher, Shiza Ali, Gianluca Stringhini, Munmun De Choudhury, and Pamela J Wisniewski. 2022. Instagram Data Donation: A Case Study on Collecting Ecologically Valid Social Media Data for the Purpose of Adolescent Online Risk Detection. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1--9.
[42]
Becca Ricks and Jesse McCrosky. 2022. Does This Button Work? Investigating YouTube's Ineffective User Controls. https://foundation.mozilla.org/en/research/library/user-controls/report/
[45]
Daniel J Solove. 2008. Understanding privacy. Harvard University Press, May.
[46]
Daniel J Solove. 2021. The myth of the privacy paradox. George Washington Law Review 89 (2021), 1.
[48]
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY

LATANYA SWEENEY

International Journal of Uncertainty, Fuzziness an... 10.1142/s0218488502001648
[49]
Yaojia Tang and Luna Wang. 2021. How Chinese Web Users Value Their Personal Information: An Empirical Study on WeChat Users. Psychology Research and Behavior Management (2021), 987--999. https://doi.org/PRBM.S318139
[50]
Janice Y Tsai, Serge Egelman, Lorrie Cranor, and Alessandro Acquisti. 2011. The effect of online privacy information on purchasing behavior: An experimental study. Information systems research 22, 2 (2011), 254--268.

Showing 50 of 59 references

Metrics
7
Citations
59
References
Details
Published
Nov 07, 2024
Vol/Issue
8(CSCW2)
Pages
1-48
License
View
Cite This Article
Alex Berke, Robert Mahari, Alex Pentland, et al. (2024). Insights from an Experiment Crowdsourcing Data from Thousands of US Amazon Users: The importance of transparency, money, and data use. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), 1-48. https://doi.org/10.1145/3687005
Related

You May Also Like

Reliability and Inter-rater Reliability in Qualitative Research

Nora McDonald, Sarita Schoenebeck · 2019

889 citations

To Trust or to Think

Zana Buçinca, Maja Barbara Malaya · 2021

606 citations

Deconstructing Community-Based Collaborative Design

Christina Harrington, Sheena Erete · 2019

470 citations

User Perceptions of Smart Home IoT Privacy

Serena Zheng, Noah Apthorpe · 2018

373 citations