Can Crowdsourcing Survive the LLM Era? A Community Survey on Human Data Collection
A recent survey of 155 researchers in NLP and related fields reveals that the increasing use of LLMs poses a significant challenge to the quality of crowdsourced data. While 44% of respondents have observed LLM usage in their collected data, many are unsure of effective mitigation strategies. Current detection methods, such as identifying stylistic patterns and rapid completion times, are considered insufficient by the research community to fully address the issue. AI
IMPACT LLM use in data collection threatens the integrity of research datasets, necessitating new detection and mitigation strategies.