As someone who likes maths and spends far too much time asking AI to generate ideas, this article on model collapse is really interesting. It outlines the potential impacts on AI models when they are indiscriminately trained on AI-generated data: they lose their ability to reflect the real world. Avoiding the maths, my read of the three distinct but interconnected things going on here:
Models lose touch with rare events because AI-generated data prioritises the most common patterns. This causes models to forget critical but less frequent data points — this could mean failing to recognise the experiences of marginalised groups, as their representation in data is statistically less frequent but socially vital (side note - ask a LLM to create a list of 1000 random people and to state their gender).
Low-sophistication models fail to capture the richness of human language and ideas, as they simplify complex patterns. This over-simplification shrinks the diversity of generated outputs (doesn't have to be low frequency), further reinforcing bland and generic responses — like generating human faces but with fine details, like freckles, smoothed out as the model lacks the capacity to articulate them.
Imperfect training processes leading to errors in learning, the model has the capacity to understand the pattern but isn't correctly trained so outputs an error. These errors can compound over generations on synthetic outputs, causing distortions to multiply — like having all the ingredients to make a souffle but only knowing how to make an omelette and then telling someone how to make a souffle.
The authors advocate for clearly identifying and labelling data as human generated or not to avoid model collapse. But imagine a scenario where AI generated data couldn't be used to create new models and we've run out of human generated data, how would the industry continue to develop?
Lessons could be learnt from the car industry post the oil crisis, it drove new ways to innovate, but the period immediately after the oil crisis was termed the "Malaise" era characterised by poor products and consumer dissatisfaction.
I might dump my AI related stock when I stop seeing freckles in AI generated imagery. 🚗🤖
hashtag#AI hashtag#FutureOfAI