Divining Dreams and Exploring LLMs
To better understand large language models (LLMs), I put together a free dream interpreter called Dream Diviner. This post will talk about building the app, LLMs, and reflect on this iteration of AI.
The past 6 months is not the first time computer scientists and the public have been excited about artificial intelligence. Apple’s Siri received much fanfare when it was included with the iPhone 4S in 2011. Our new future of smart assistants was here, voice was the new interface, and the mouse+keyboard combination was headed for the junkpile.
Fast-forward 12 years to today and human-computer interaction doesn’t look that much different. Siri is good for setting timers and reminders, but that’s about it. Amazon’s Alexa is internally considered a “colossal failure” that burns $10 billion per year. It appears we got out over our skis, once again.
That being said, the applications of artificial intelligence and machine learning have seen some interesting uses for humans-in-the-loop system. For example, one might assume that providing an AI “barrier” for customer support allows humans to get involved in support cases where humans provide differentiation. If a customer just needs a link to help doc, a chatbot is perfectly capable of forwarding such a link.
This Time It’s Different?
With the release of ChatGPT in November, 2022, OpenAI gifted the world a conversational interface for its GPT-3 language model. Its initial release was in June, 2020, and it wasn’t until it got a chat interface did the world care. Having a great interface instead of a general-purpose tool is a point I’ll come back to in this post.
The OpenAI + Microsoft alliance seems to have a headstart, which has Facebook and Google playing catchup. Stock prices move in response to AI demos of each company. Others have pointed out that Web3 grifters are now all-in on AI, the next Big Thing™. The hype cycle is in hyperdrive.
And the current technology is definitely impressive. I’ve been personally surprised with its ability to generate boilerplate for things like Rails applications, Terraform resources, or answer esoteric API questions. It does an admirable job of combinatorial programming problems, like figuring out how to have multiple Ruby gems play nicely together. (“Please write a Sidekiq job that iterates over the children of an ActiveRecord object and updates each child’s
updated_at timestamp to now.”)
It also handles the combinatorial editorialized prompts well too, which is part of what it makes it so fun and surprising. (“Re-write this email in the style of Shakespeare.”)
But beyond the whimsy of a smart chatbot, I haven’t yet seen a truly novel experience where using ChatGPT, GPT-3, or something like it is demonstrably better than what currently exists. Benedict Evans maybe says it best:
Microsoft & Google adding generative AI into office apps is a classic pattern of incumbents making the new thing a feature. But the new thing generally also enables completely new ways to solve the problem. ‘Easier spreadsheets’ is less important than ‘why is that a spreadsheet?’— Benedict Evans (@benedictevans) March 16, 2023
It will take time to understand who will be the bookkeepers to this technology’s spreadsheet, and if any truly new experiences are possible and sustainable with this technology.
A Useful Tool
Based on my understanding of the underlying technologies, the dangers (hallucinations and Wailuigi effect, among others), ChatGPT and tools like it excel as tools where humans remain very much in-the-loop. And that’s just fine. Airplanes still have humans in the loop (2-ish), which is far fewer than the 100’s needed to sail a ship in the 17th century.
That’s where and how we’ll see AI-driven improvements: by augmenting existing processes with humans in the loop. When a 10x or 100x productivity gain can be realized, this might start to reasonably replace jobs (or just make jobs so much more efficient that one doesn’t need to hire as many software engineers, for example). Because of these models need to be supervised by humans, most applications of this generation of AI will be working in the realm of increasing efficiency rather than something net-new. That may not see as exciting as a robot automating your life, but it is still progress nonetheless.
Leaning into the Faults
So this round of AI improvements and hype may be overblown for the time being. But with any system, it’s always fun (and sometimes profitable) to exploit its weaknesses. These LLMs have been known to develop overconfidence, where they confidently do something incorrectly. This has been called “hallucination”.
This is where your dear author decided to lean in and use this as an excuse to improve my own understanding of what’s out there today. Dream interpretations were raised as something that ChatGPT was particularly good at. My thinking was as follows:
- ChatGPT is a general-purpose tool, which make it difficult to approach.
- Purpose-specific interfaces that map directly to a person’s intent will likely make for much better applications of AI than a master-of-all-trades interface, even if they are powered by the same technology underneath. This is the Mario-Fire Flower problem.
- It helps when the stakes are low and something can be built in a weekend.
So, enter Dream Diviner. It’s a small site that is meant to interpret dreams. I’ve left the door open for charging for this, just in case it starts to become popular. This concept leans into the hallucination flaw and has fun with it. The interpretations that result are pretty reasonable.
From a technical level, this is little more than an API call to OpenAI. It’s built on Ruby on Rails (of course) and is all of one controller.
The most interesting technical challenge here was likely creating the prompt to send to OpenAI. The responses that one gets from these LLMs are interesting in how unstructured they are. I wanted the site to offer a short and a long interpretation of each dream, so I had to tell it about the structure ahead of time. At the time of writing, here is the prompt that I’m using:
We are interpreting dreams provided by the users. Dream interpretations should try to convey meaning of each dream while providing positive feedback and how each dream my relate to a user's day. The answer should be separated into 2 distinct chunks: SHORT and LONG. The SHORT answer should be less than 100 words, while the LONG answer can be up to 1,000 words. Users will receive the SHORT response for free, but may have to pay for the LONG answer in the future. Here is the user-provided text:
That AI dutifully separates the response into
LONG, which is enough structure to string-split the response and save to the database. There are other concepts and knobs that can be tweaked through the API that I do not yet understand and that’s okay. For how powerful this is, I’ve been surprised at how approachable this API is.
This year’s evolution of AI is being heralded as a step-change. The FAANG companies are tripping over themselves to upend roadmaps and ship products at a breakneck speed. The amount of competition in this realm has been remarkable. Competition is messy, however, and often doesn’t benefit those second or third place. It does benefit the consumer of these APIs though. The underlying technology itself is indeed remarkable, but we will likely look back on this time as a period driven by hype.
In the hype, though, a few sharp folks will likely find some killer applications for this generation of AI tooling. My bet is that it will come in the form of drastically increased efficiencies in places opaque to the outside world: some expensive internal processes of companies will become 10x or 100x more efficient. These savings may either be passed onto shareholders or re-invested into more R&D.
Until a version of GPT-N can replace a human, that is.
Special thanks to ChatGPT for reading early versions of this blog post and providing feedback.