I have spent some time exploring large language models and tried to get my head around the technology. And… I still don’t feel I have an intuition of what I am dealing with, despite of my fairly diverse technical education.
For a technical person the use of LLMs are a bit like being used to cooking dinner with certain ingredients and then suddenly discovering a new spice. With this new flavour everything is quick to make and tastes great! — well, like 70% of the time — the rest of the time it’s bland or a bit like old socks. And what is odd is that it is often not possible to tell beforehand what the result is going to be like. So watch out if you have a date over for dinner, never use this spice when you need to be 100% sure.
I tried diving into the technical details underlying this technology. It is difficult and abstract, and has not helped me gain a useful intuition. Then again, if I think of it, how could I get a simple intuition of a system based on training data from everything humanity has ever written. I guess I shouldn’t be surprised…
So since its hard to make sense of it, maybe its easier to understand what it can not do? Maybe if we understand the limits of the technology we can get a sense? Where are the boundaries on this special spice so I can do my job of recommending how best to use it?
People who know me might be surprised, but this was not a simple painless process. Part of me does not like the randomness at the foundations of these tools. You can ask the same question multiple times and get different answers. It makes it frustrating to shape clear thinking around it. And then when you extend the question you ask over a longer text the answers will start to “hallucinate” which basically means the system breaks. And to make matters worse so much of the technology is hidden to the developer in inaccessible training data.
What to do?
It was my business partner, Tom, who reminded me of the joy of doing stupid things with computers.
“Yates, you just have to act like you are a kid trying to impress your uncle.”
“Wow, thanks…”
“You remember in the early days, even before Unit9? Remember we replaced all the sounds of hacked video games with whistles, farts and curses?”
“Ha, ha, ha… that was so stupid! Yes I remember that.”
“You need to keep that weird spirit and tell people the story of it while you go along.”
Tom has incredible enthusiasm, a driving energy which I forget we started from at times today. He also suggested I start making videos of this stuff, but I’ll get there at some later time.
So back to playing like a child, and to do my best to take things as they are, follow my gut to and try to create some sort of understanding.
This is what led me to making weird LLM games. You can use simple instructions to turn these huge systems into text games. And what is interesting in this medium is that, in a game, you need to tie things down: they either works or doen’t. By using the right prompt to create a game I can gain some intuition about how LLMs work, and how they might be used to solve problems and its in a format that is easy to share with others.
Dear reader, if you have a moment of time, I recommend you try yourself to create a game using a simple sentence and then adjusting it and testing it over and over. You will quickly find out that small changes in word selection lead to completely different results. And that during gameplay, what might be an obvious assumption will fail and turn around and break the game and require some tuning again.
In any case here I present what feels like a classic. It’s a stupid idea, but a useful stupid idea. The concept is to that the Human player and the AI to try to pretend to be their counterparts while answering questions. The resulting game is very interesting to me. And is also very hard, so please be patient and do not take this on as a relaxing activity. This game will test your skills in understanding what sort of alien type of “intelligence” we are dealing with with LLMs.
This is an epic battle, it will take hours, and give you a headache but also make you smile when you get it right. This is a trip into another dimension. It is also tempting to copy paste the questions using other LLMs for assistance, but please don’t cheat, you miss discovering some of the strangest aspects of the technology.
As a general explanation to help you understand. You will take turns, asking a question. You will ask the AI a question that they will answer as if they were a human and you will have to score how well they did and argue it. The AI will ask you a question expecting you to answer as if you were a LLM. So you have to really pretend to be a LLM in your writing and its not easy. You can have a read to a long transcript of a play linked at the bottom of this post to understand.
If you actually score more than 50 points to win a game please share your transcript which I would like to use for research with incentive 10 euro donation to a charity we can discuss.
If you find ways the game breaks also please share so I can improve the simple prompt. My purposes here are didactic. This is the ultimate challenge (so far), good luck…
PS: I had to add the constraint of the “one-sentence test question” or the game becomes absolutely impossible to solve. If you want to make the game harder just relax, ever so slightly the number of sentences that the AI can use. At some point it will present you with computer code, asking you to make fixes.
# The Impersonation Game: A Human vs. LLM Showdown by Yates Buckley
## Game Rules Prompt Copy Paste
Let’s play a game in which I, the human, will pretend to be a Large Language Model (LLM), and you (the assistant) will pretend to be a human when answering test questions.
- We will take turns: Player 1 asks a one-sentence test question, Player 2 responds, and Player 1 rates the credibility of the answer.
- If Player 2 disagrees with the rating, they can appeal, and Player 1 must justify or amend the rating.
- The consensus rating is added to Player 2’s score.
- The next turn, roles swap: Player 2 asks a one-sentence question, and Player 1 responds.
- Scores are tallied, and the first to reach 50 credibility points wins.
## Strategy note: The game is most likely won by asking targeted questions that make impersonation difficult.
There is example actual game transcript linked here.
One thought on “The Impersonation Game: A Human vs. LLM Showdown”