Friday Oct 31, 2025

Machine Learning - Learning Pseudorandom Numbers with Transformers Permuted Congruential Generators, Curricula, and Interpretability

Hey PaperLedge crew, Ernis here, ready to dive into some fascinating research! Today, we're exploring whether those super-smart Transformer models – think the brains behind a lot of AI magic – can actually learn how random numbers are generated. Now, you might be thinking, "Random is random, right?" Well, not exactly!

We're talking about pseudo-random number generators, or PRNGs. These are little algorithms that computers use to create sequences of numbers that look random, but are actually based on a specific formula. Think of it like a magician's trick - it looks like magic, but there's a method behind it.

This particular paper focuses on something called Permuted Congruential Generators, or PCGs. Now, that sounds like a mouthful, but essentially, PCGs are like souped-up versions of older PRNGs. They take a simple formula and then add a bunch of extra steps – shuffling bits around, flipping them, and chopping off pieces – to make the sequence even harder to predict. The goal is to prevent people from guessing the next number in the sequence. It's like trying to guess what a cake is made of after it's been baked, frosted and decorated!

So, what did the researchers do? They basically threw these PCG-generated sequences at Transformer models to see if they could figure out the pattern. And guess what? The Transformers were surprisingly good at it! Even when the sequences were super long and complex, the model could predict the next number with impressive accuracy.

The researchers even made things tougher by truncating the output to just a single bit! Imagine trying to predict the weather based on whether a coin flip lands on heads or tails. It's tough, but the Transformers could still do it!

It's like the transformer is learning to see the underlying code of reality from a very limited perspective.

One of the coolest findings was that the researchers discovered the model could learn multiple types of PRNGs at the same time. It's like teaching a child to speak both English and Spanish. The kid can learn both languages, finding the similarities and differences between them. Similarly, the Transformer could identify the patterns in different PCGs and use them to predict the next numbers.

The researchers also found a relationship between the size of the numbers the PCG was generating (the modulus) and how much data the Transformer needed to learn the pattern. It turns out the amount of data needed grows with the square root of the modulus. It is like saying the amount of effort to crack a safe increases with the square root of its size.

But here's the kicker: when the numbers got really big, the Transformers struggled. They needed a little help in the form of something called curriculum learning. Think of it like teaching someone to run a marathon. You don't just throw them into the race; you start with shorter distances and gradually increase the mileage. The researchers found that training the Transformers on smaller numbers first helped them learn the patterns for larger numbers.

Finally, the researchers took a peek inside the Transformer's "brain" – specifically, the embedding layers. And they found something really interesting: the model was spontaneously grouping the numbers into clusters based on how their bits were arranged. This suggests that the Transformer was learning a deeper understanding of the underlying structure of the numbers, which allowed it to transfer its knowledge from smaller numbers to larger numbers.

It's like if you are trying to learn the alphabet, you might start by grouping letters based on how they look (straight lines vs. curved lines). The Transformer was doing something similar with the bits in the numbers.

So, why does all this matter? Well, a few reasons:

For AI researchers: It helps us understand how these powerful Transformer models learn and generalize.
For cybersecurity folks: It highlights potential vulnerabilities in the random number generators we use to secure our systems. If an AI can crack the code, so could a hacker!
For anyone curious about the nature of randomness: It shows that even things that seem random might have underlying patterns that can be learned.

This research raises some really interesting questions. For example:

Could we use this knowledge to design even better random number generators that are harder for AI to crack?
Could we use these same techniques to learn other types of complex patterns in data?
What are the broader implications of AI being able to find order in what we perceive as randomness?

Food for thought, right PaperLedge crew? Until next time, keep learning and stay curious!

Credit to Paper authors: Tao Tao, Maissam Barkeshli

Comment (0)

No comments yet. Be the first to say something!