Let your knowledge grow - Evaluate protected data with LLMs
The world is becoming more and more information-rich, sometimes of good and sometimes of bad quality. Large Language Models (LLMs) like ChatGPT are trained on this information to answer our questions and assist us. To improve their answers, LLMs can use methods like Retrieval Augmented Generation (RAG), which attaches external information to the question to generate better answers. However, the most valuable and interesting information is often not freely accessible. So, how can large language models deliver better answers by incorporating protected information?
If we combine LLMs and data spaces using the concept of retrieval-augmented generation, questions can be sent to a data space, asking each data provider how likely it is that their data set will improve the answer. The data sets are offered to the user together with a probability calculation of how helpful the data is. The user can accept the terms of use (e.g. by purchasing the data) and then gain access to the raw data. The LLM on the consumer side then generates an answer to the original question from the new data.
By linking data spaces with LLMs via the concept of RAG, business cases can be realised that were not feasible in the past. LLM-based data spaces for searching in protected data are a development of the Fraunhofer ISST in cooperation with Huawei using the technology Boot-X.