How Gen-AI Is Getting Faster And Learning To Admit It Doesn't Know
DataStax CTO Jonathan Ellis on the breakneck pace of AI innovation
The arrival of user-friendly large language models has shoved the possibilities of generative AI firmly into the public consciousness, suddenly adding whole new dimensions to search, information retrieval, translation, programming, art, design and writing.
These capabilities have led some analysts to predict an annual 27% growth rate for AI over the next ten years.
But even their biggest fans would have to admit that these applications are not perfect. Sometimes they resemble a compulsive liar who can't resist injecting a falsehood into their narrative, just to see if they can get away with it. They also move at a glacial pace, especially with more complex prompts. You can almost see them thinking.
"Computers can do many things way faster than humans, but in this case the state of the art is slow," says Jonathan Ellis, CTO and cofounder of DataStax.
Ponderous output may not matter too much for personal use, but for businesses it could be a blocker.
Upping the pace
LLMs' relatively sluggish response is a consequence of the transformer architecture. Models like ChatGPT can only predict the one word at a time, and for each new word is added in light of how the previous one might have changed the context. But there are ways to speed things up. One is to have a pre-processing stage, building vector search indexes when ingesting data using a method called quantization, which effectively compresses the information so there are fewer steps to take when the model acts on a prompt. Vector search can now encode sentences or entire paragraphs rather than just words. Disk I/O and memory optimizations can also alleviate processing bottlenecks.
These are the approaches taken by Microsoft's DiskANN vector search algorithm, one of several competing techniques, which claims to be able to "Index 5-10X more points per machine than the state-of-the-art DRAM-based solutions," along with improved accuracy and lower use of memory.
Vector search is a critical part of AI applications, retrieving data and context and feeding it rapidly to LLMs so they have enough information to answer a query accurately. ANN stands for "approximate nearest neighbor," a popular search technique that offers a workable balance between search speed, implementation difficulty, resource usage and cost. There exist several competing ANN algorithms, including DiskANN, Hierarchical Navigable Small World graphs (HNSW), and Navigating Spread-out Graph (NSG), each of which has its adherents. And there are approaches like Microsoft's SPANN, as used in Bing, which doesn't use graph at all and claims to be twice as quick as DiskANN.
In short, there's a lot happening in vector search, and it's happening fast.
Machine learning for Java developers
Ellis recently open-sourced JVector, a vector search engine that's based on DiskANN, but written in Java. DataStax claims it is 10 times faster than the earlier engine, Apache Lucene, especially with large datasets.
JVector has been powering vector search in DataStax Astra, the company's cloud DBaaS offering, for a few months. Like Cassandra, the database behind Astra, it's written in Java, a language that's hugely popular in the enterprise but under-served in AI/ML.
"There's very, very little for Java developers to use in the machine learning space because everything's written in C++ and exposed with Python bindings," said Ellis. "And so I'm hoping that JVector can be part of bootstrapping machine learning for Java developers as well."
JVector will be incorporated into the Apache Cassandra codebase in the next few weeks. In fact that's already happening, said Ellis, but it's not production ready quite yet.
However, as a standalone project it could be used with other databases, or in embedded projects without a dedicated database.
Ellis expressed the hope that other companies and individuals will join the open-source community around the project. Unlike Cassandra, of which Ellis was project chair, JVector is a "bite-size" project, he said.
"It's much more approachable for a company or an individual to say, 'hey, I'd like to come and make that a bit better'."
Tell the truth
Reducing hallucinations is another feature of the modern ANN approaches. The preliminary steps can help developers to provide the right information in the right context to the model, making it less inclined to take a random punt. This approach has already improved the accuracy of current algorithms, which is why GPT4 is better than GPT3.5. But Ellis says that even better things are on their way, including Active Retrieval Augmented Generation (ARAG), a technique that combines LLMs with efficient retrieval of external knowledge, enabling them to admit ignorance rather than guessing.
The way it works, Ellis explained, is that the LLM provides metadata along with the text it is outputting at each step, a number between 0 and 1 representing the probability it is accurate.
"You can look at these numbers and say, 'Oh wow, it looks like it's got stuck here because it's not sure what it's saying', so I'm going to take that and do another search on that topic specifically, so I can improve my answer on the fly."
It will never be possible to get the probability of error to zero, but 1% should be well within the realm of possibility, he said.
"If I ask my co-worker, he's going to be wrong at least 1% of the time, so that's like parity!"
Asked about the challenge of keeping up with the breakneck pace of development in the machine learning sphere, Ellis said that's all part of the fun.
"I love the chaos, I love being part of something new that nobody knows the answers for, where there's a new way to do something better every month. That's really exciting, and it's a bonus for recruiting too. I can go to the best engineers I've ever worked with and say hey, come and build this with me, and I've had a very high success rate. Because it is exciting for that kind of person who enjoys the thrill of the hunt, doing something that's never been done before."
In AI/ML there are developments just waiting to be discovered, he added.
"One of the cool things about being in a new field like this is that there are still free lunches."
This article originally appeared on our sister site Computing.