An Unsupervised Machine Learning Analysis of Environmental Su ...
Abstract
Large corporations contribute to waste proliferation and global environmental degradation; however, more research is needed to understand these trends. Unsupervised machine learning can provide unique insights into how firms cluster on different environmental sustainability indicators with relevance for the circular economy (CE) framework (i.e., metrics related to waste reduction and promotion of reuse). This study uses data from 2010 to 2019 for 2,485 global companies to explore how firms cluster together on environmental sustainability metrics and whether a firm’s headquarters location and industry determine its clustering. Using the k-means unsupervised machine learning algorithm, one-way ANOVA, and Chi-square tests for key variables (e.g., industry, location of headquarters, total environmental cost and progress toward the UN Sustainable Development Goals 12.2, 14.1 and 14.2) across clusters, the study found that the majority of firm-year observations cluster together (n = 13,313), with a small minority of firm-years (n = 39) with the most negative CE and environmental impacts clustered together, as well. The key findings indicated that the k-means algorithm grouped firms into four distinct clusters. Firms headquartered in the European Union were not more likely to be in the most sustainable cluster, whereas firms in extractive industries (e.g., fossil fuel and mining) were more likely to be in the least sustainable cluster. These results provide a proof of concept for applying unsupervised machine learning for grouping firms based on environmental sustainability metrics and can help policymakers to identify key factors that could influence firms to adopt business practices aligned with CE goals worldwide.