RAG in Practice: Mini-Hackathon

How do I use Retrieval Augmented Generation (RAG) for my tasks?

In the modern AI world, Large Language Models (LLMs) have become an indispensable tool. However, despite their impressive capabilities, they reach their limits when it comes to processing factual knowledge and specific data. This is where Retrieval-Augmented Generation (RAG) comes into play. RAG combines the strengths of retrieval systems and generative models to deliver more precise, fact-based, and contextual answers. This workshop introduces you to the fundamentals of RAG and shows you how to use this technology in practice.

What is RAG? RAG is a method that extends LLMs through the integration of external data sources. While conventional LLMs are based on their trained knowledge, which extends up to the point of training, RAG enables access to current and specific information. This is achieved through the combination of two main components:

Retrieval System: This system searches a database or collection of documents to find relevant information.
Generative Model: The generative model uses the retrieved information to generate a natural and precise answer.

Through this combination, LLMs can deliver answers that are not only grammatically correct but also fact-based and verifiable – both in the context of a chatbot and for enhanced search. After a welcome and brief introduction to the technical background, we want to spend more than two hours building prototypes in groups for our own application scenarios. All participants bring:

their own idea for using RAG
test data in an HU-Box in the formats Markdown, PDF, CSV, Word, Excel, PPT, JPG
computer, access to HU VPN
enthusiasm for experimenting

Goal of the Mini-Hackathon: All participants create their own small bot!

Registration until September 10, 2025 at iz-d2mcm.contact@hu-berlin.de.

Program

Time	Program Item
10:00-10:15	Welcome and Introduction
10:15-10:30	Technical Background
10:30-12:30	Hackathon in Groups
12:30-13:00	Presentation of Prototypes & Discussion: Basic services vs. research software. What are the respective advantages and disadvantages in application?

Information about the Instructor

Malte Dreyer is Director of Computer and Media Service at Humboldt University Berlin and Chairman of the IT Board as well as the Information Processes Steering Group (CIO body) of HU. He is involved in numerous projects for the development of research data management and the use of artificial intelligence methods in teaching and learning. Previously, he was Director of the Research and Development Department of the Max Planck Society, Max Planck Digital Library. He designed and developed the research and publication data infrastructure for the institutes of the Max Planck Society as well as various research tools. As part of several major German, European and international projects, he has worked in the areas of digital research infrastructures, repositories, virtual research environments and software architectures in many scientific disciplines. Furthermore, he is a member of several national and international committees such as the HRK Commission for Digitalization, the Alliance Focus Group “Digital Infrastructures, Services and Data Tracking” and the US CIO Association LBCIO.