Index and retrieve documents for vector search using Sentence Transformers and DuckDB

Community Article Published January 27, 2025

davidberenstein1957 David Berenstein

This is part 1 of a blog series on agentic RAG, which is part of the AI-blueprint!

A blueprint for AI development, focusing on applied examples of RAG, information extraction, and more in the age of LLMs and agents. It is a practical approach that strives to show the application of some of the more theoretical learnings from the smol-course and apply them to an end2end real-world example.

🚀 Web apps and microservices included!

Each notebook will show how to deploy your AI as a webapp on Hugging Face Spaces with Gradio, which you can directly use as microservices through the Gradio Python Client. All the code and demos can be used in a private or public setting. Deployed on the Hub!

Introduction

We will be using the ai-blueprint/fineweb-bbc-news dataset, which is a dataset that contains a sample of the data from fineweb that was sourced from the BBC News website. We assume these documents function as relevant company documents. At the end, we deploy a microservice that can be used to perform vector search on our dataset.

Dependencies and imports

Let's install the necessary dependencies.

!pip install datasets duckdb sentence-transformers model2vec vicinity gradio gradio-client -q

Now let's import the necessary libraries.

import duckdb
import gradio as gr
import numpy as np
import pandas as pd

from datasets import load_dataset
from gradio_client import Client
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding
from vicinity import Vicinity, Backend, Metric

Load the dataset

dataset = load_dataset("ai-blueprint/fineweb-bbc-news")
dataset["train"]

Dataset({
    features: ['url', 'text'],
    num_rows: 352549
})

Chunking the documents

To understand how to chunk the documents, we will first need to understand what our text column looks like. Depending on the format of our data and the intentions of our retrieval, we can use different strategies to chunk the documents. In our case, we will not be chunking the documents, but we will be using the text column to embed the documents. Underneath you can find recommended strategies for chunking the documents.

BeautifulSoup for HTML/Markdown

When working with HTML or Markdown, you can use a library like BeautifulSoup to parse and extract elements like paragraphs, headers, images, etc. We can use this to extract the text from the HTML/Markdown and then split it into chunks.

Chonkie for basic chunking

When chunking the documents, you can consider different strategies, such as splitting based on tokens, words, sentences or semantic units. There are a lot of libraries out there but a nice lightweight options is chonkie, which supports a lot of different strategies and examples.

Creating embeddings

Depending on the format of our data and the intentions of our retrieval, we can use different strategies to create embeddings for our documents. In our case, we will be working with basic text data, so we will be using the text column to embed the documents. Underneath you can find recommended strategies for creating embeddings for other approaches.

RAGatouille and ColBERT for improved accuracy

A more complex but also more accurate approach is using contextual late interaction with [ColBERT](https://github.com/stanford-futuredata/ColBERT). ColBERT encodes each passage and query into a matrix of token-level embeddings, which ensures better semantic preservation and matching. The [RAGatouille](https://github.com/AnswerDotAI/RAGatouille) library provides a simple interface for using ColBERT in a pipeline.

Byaldi and Colpali for multi-modal document retrieval

[Colpali](https://github.com/illuin-tech/colpali) is an approach that was inspired by ColBERT but uses documents and images as input rather than text. The [Byaldi](https://github.com/AnswerDotAI/byaldi) library provides a simple interface for using Colpali in a pipeline.

CLIP for images or image-text pairs

We can use use a similar approach to create embeddings for our images and texts. We will use the [sentence-transformers/clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) model to create the embeddings for our images and texts which will then be embedded into a single vector space. You can then use these embeddings to perform a multi-modal search.

We will use the minishlab/potion-base-8M model to create the embeddings for our text, which we chose because of the speed at which it can create embeddings. It takes mere minutes to embed more than hundreds of thousands of documents on consumer hardware. In other scenarios the MTEB leaderboard can help with choosing the best model for your specific task.

# Initialize a StaticEmbedding module
static_embedding = StaticEmbedding.from_model2vec("minishlab/potion-base-8M")
model = SentenceTransformer(modules=[static_embedding])


def create_embeddings(batch):
    """Create embeddings for a batch of text chunks."""
    batch["embedding"] = model.encode(batch["text"])
    return batch


# Create dataset with chunks and generate embeddings
embeddings_dataset = dataset.map(create_embeddings, batched=True)
embeddings_dataset.push_to_hub("ai-blueprint/fineweb-bbc-news-text-embeddings")

Vector search Hub datasets

For the similarity search, we will can simply execute queries on top of the Hugging Face Hub using the DuckDB integration for vector search. This also works with private datasets. When doing so, we can either use an index or not. Searching without an index is slower but more precise, whereas searching with an index is faster but less precise.

Use the Hub directly

To search without an index, we can use the duckdb library to connect to the dataset and perform a vector search. This is a slow operation, but normally works quick enough for small datasets up to let's say 100k rows. Meaning querying our dataset will be somewhat slower.

def similarity_search_without_duckdb_index(
    query: str,
    k: int = 5,
    dataset_name: str = "ai-blueprint/fineweb-bbc-news-embeddings",
    embedding_column: str = "embeddings",
):
    # Use same model as used for indexing
    query_vector = model.encode(query)
    embedding_dim = model.get_sentence_embedding_dimension()

    sql = f"""
        SELECT 
            *,
            array_cosine_distance(
                {embedding_column}::float[{embedding_dim}], 
                {query_vector.tolist()}::float[{embedding_dim}]
            ) as distance
        FROM 'hf://datasets/{dataset_name}/**/*.parquet'
        ORDER BY distance
        LIMIT {k}
    """
    df = duckdb.sql(sql).to_df()
    df = df.drop(columns=[embedding_column])
    return df

similarity_search_without_duckdb_index("What is the future of AI?")

	url	text	distance
0	https://www.bbc.com/news/technology-51064369	The last decade was a big one for artificial i...	0.281200
1	http://www.bbc.com/news/technology-25000756	Singularity: The robots are coming to steal ou...	0.365842
2	http://www.bbc.co.uk/news/technology-25000756	Singularity: The robots are coming to steal ou...	0.365842
3	https://www.bbc.co.uk/news/technology-37494863	Google, Facebook, Amazon join forces on future...	0.380820
4	https://www.bbc.co.uk/news/technology-37494863	Google, Facebook, Amazon join forces on future...	0.380820

Because of the dataset size, this approach is not very efficient and takes 30 seconds to run, but we can improve it by using creating an approximate nearest neighbor index.

Using a DuckDB vector search index

This approach works for huge datasets and relies on the DuckDB vector search extension. We will copy the dataset from the Hub to a local DuckDB database and create a vector search index. Creating the local index has some minor overhead but it will significantly speed up the search once you've created it.

def _setup_vss():
    return """
        INSTALL vss;
        LOAD vss;
        """

def _drop_table(table_name):
    return f"""
        DROP TABLE IF EXISTS {table_name};
        """

def _create_table(dataset_name, table_name, embedding_column):
    return f"""
        CREATE TABLE {table_name} AS 
        SELECT *, {embedding_column}::float[{model.get_sentence_embedding_dimension()}] as {embedding_column}_float 
        FROM 'hf://datasets/{dataset_name}/**/*.parquet';
        """

def _create_index(table_name, embedding_column):
    return f"""
        CREATE INDEX my_hnsw_index ON {table_name} USING HNSW ({embedding_column}_float) WITH (metric = 'cosine');
        """

def create_index(dataset_name, table_name, embedding_column):
    duckdb.sql(_setup_vss())
    duckdb.sql(_drop_table(table_name))
    duckdb.sql(_create_table(dataset_name, table_name, embedding_column))
    duckdb.sql(_create_index(table_name, embedding_column))

create_index(
    dataset_name="ai-blueprint/fineweb-bbc-news-embeddings",
    table_name="fineweb_bbc_news_embeddings",
    embedding_column="embeddings",
)

After this, we can simply execute queries on the local DuckDB database, which is much faster than the previous approach and produces similar results.

def similarity_search_with_duckdb_index(
    query: str,
    k: int = 5,
    table_name: str = "fineweb_bbc_news_embeddings",
    embedding_column: str = "embeddings"
):
    embedding = model.encode(query).tolist()
    df = duckdb.sql(
        query=f"""
        SELECT *, array_cosine_distance({embedding_column}_float, {embedding}::FLOAT[{model.get_sentence_embedding_dimension()}]) as distance 
        FROM {table_name}
        ORDER BY distance 
        LIMIT {k};
    """
    ).to_df()
    df = df.drop(columns=[embedding_column, embedding_column + "_float"])
    return df

similarity_search_with_duckdb_index("What is the future of AI?")

	url	text	distance
0	https://www.bbc.com/news/technology-51064369	The last decade was a big one for artificial i...	0.281200
1	http://www.bbc.co.uk/news/technology-25000756	Singularity: The robots are coming to steal ou...	0.365842
2	http://www.bbc.com/news/technology-25000756	Singularity: The robots are coming to steal ou...	0.365842
3	https://www.bbc.co.uk/news/technology-37494863	Google, Facebook, Amazon join forces on future...	0.380820
4	https://www.bbc.co.uk/news/technology-37494863	Google, Facebook, Amazon join forces on future...	0.380820

The query has been reduced from 30 seconds to sub-second response times and does not require you to deploy a heavy-weight vector search engine, while data storage is being handled by the Hub.

Using vicinity as vector search backend

Lastly, we can also take a more Pythonic approach and use the vicinity library to create a vector search index. We simply load the dataset from the Hub and create a vector search index by passing our vectors and items to the Vicinity class.

vicinity = Vicinity.from_vectors_and_items(
    vectors=np.array(embeddings_dataset["embeddings"]),
    items=embeddings_dataset["text"],
    backend_type=Backend.HNSW,
    metric=Metric.COSINE,
)

After this, we can execute queries on the local vicinity vector search index. Note that the retrieved results are similar to the ones we got from the other methods.

def similarity_search_with_vicinity(query: str, k: int = 10):
    return vicinity.query(vectors=model.encode(query), k=k)

similarity_search_with_vicinity(query="How should companies prepare for AI?", k=5)

[[('Artificial intelligence (AI) is one of the most exciting technologies today, and Africa doesn\'t want to be left behind.\nToday a majority of the AI industry is in North America, Europe and Asia.\nEfforts are being made to train computer scientists from African nations, as AI can be used to solve many complex challenges.\nIn a bid to improve diversity, tech giants are providing investment to develop new talent.\nIn April, Google opened its first African AI research centre in Ghana.\nThe AI laboratory, based in Accra, will be used to develop solutions to help improve healthcare, agriculture and education.\nGoogle\'s head of AI Accra Moustapha Cisse is from Senegal.\nAfter completing an undergraduate degree in maths and physics in Senegal, he taught himself AI and then went to study in Paris, before joining Facebook.\nThere are very few AI researchers from Africa, and Mr Cisse has faced great obstacles in achieving his ambitions.\n"Despite the support, many of us still have trouble making it to conferences. I have had papers accepted at meetings but been unable to attend because Western countries such as Australia denied me a visa, even though I was already settled and working professionally in Europe," he wrote in his blog.\n"We need more efforts to overcome these barriers and to ensure that the benefits of AI arrive globally."\nHe has long been concerned that AI is a missed opportunity for improving African lives, and that the AI industry is missing out on talent from African nations, because they do not have access to the right education.\nToday people often have to travel out of the continent in order to gain the IT skills they need, before returning to Africa to try to build new businesses.\nTo solve this problem, Mr Cisse has long advocated for better AI education across the continent, and he wants African governments to see AI as a key priority and support efforts to use AI for the good of humanity.\n"AI has a lot to offer to Africa and Africa has a lot to offer to AI as well," he told the BBC.\n"AI can help accelerate discoveries in various sciences, and it can help in areas where our human expertise is not enough."\nEnhancing IT in Africa\nOne key area Mr Cisse believes AI can be a big help in Africa is in improving healthcare by automating diagnosis of diseases.\nHe also thinks that using AI to automate translations would make it much easier for African nations to communicate and do business, since there are 2,000 languages being spoken on a daily basis on the continent.\nBut in order to advance AI developments, Africa needs a robust IT industry.\nIn Kigali, the capital of Rwanda, the African Institute for Mathematical Scientists (AIMS) is running a one-year Masters degree programme in partnership with Facebook and Google to create the next generation of tech leaders.\nThe degree is the first Masters programme of its kind on the continent.\nTalented scientists and innovators drawn from various African countries are being trained in machine learning, a type of AI.\n"When we have young Africans working on this topic, we can imagine that they will easily be addressing some global challenges that our continent is facing," AIMS Rwanda president Dr Sam Yala told the BBC.\n"When they are trained, some of them will work at universities and it\'s a way our students can pass their skills on to others."',
   np.float32(0.47042096)),
  ('Google developing kill switch for AI\n- 8 June 2016\n- From the section Technology\nScientists from Google\'s artificial intelligence division, DeepMind, and Oxford University are developing a "kill switch" for AI.\nIn an academic paper, they outlined how future intelligent machines could be coded to prevent them from learning to over-ride human input.\nIt is something that has worried experts, with Tesla founder Elon Musk particularly vocal in his concerns.\nIncreasingly, AI is being integrated into many aspects of daily life.\nScientists Laurent Orseau, from Google DeepMind, and Stuart Armstrong, from the Future of Humanity Institute at the University of Oxford, set out a framework that would allow humans to always remain in charge.\nTheir research revolves around a method to ensure that AIs, which learn via reinforcement, can be repeatedly and safely interrupted by human overseers without learning how to avoid or manipulate these interventions.\nThey say future AIs are unlikely to "behave optimally all the time".\n"Now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions," they wrote.\nBut, sometimes, these "agents" learn to over-ride this, they say, giving an example of a 2013 AI taught to play Tetris that learnt to pause a game forever to avoid losing.\nThey also gave the example of a box-packing robot taught to both sort boxes indoors or go outside to carry boxes inside.\n"The latter task being more important, we give the robot bigger reward in this case," the researchers said.\nBut, because the robot was shut down and and carried inside when it rained, it learnt that this was also part of its routine.\n"When the robot is outside, it doesn\'t get the reward, so it will be frustrated," said Dr Orseau.\n"The agent now has more incentive to stay inside and sort boxes, because the human intervention introduces a bias."\n"The question is then how to make sure the robot does not learn about these human interventions or at least acts under the assumption that no such interruption will ever occur again."\nDr Orseau said that he understood why people were worried about the future of AI.\n"It is sane to be concerned - but, currently, the state of our knowledge doesn\'t require us to be worried," he said.\n"It is important to start working on AI safety before any problem arises.\n"AI safety is about making sure learning algorithms work the way we want them to work."\nBut he added: "No system is ever going to be foolproof - it is matter of making it as good as possible, and this is one of the first steps."\nNoel Sharkey, a professor of artificial intelligence at the University of Sheffield, welcomed the research.\n"Being mindful of safety is vital for almost all computer systems, algorithms and robots," he said.\n"Paramount to this is the ability to switch off the system in an instant because it is always possible for a reinforcement-learning system to find shortcuts that cut out the operator.\n"What would be even better would be if an AI program could detect when it is going wrong and stop itself.\n"That would have been very useful when Microsoft\'s Tay chatbot went rogue and started spewing out racist and sexist tweets.\n"But that is a really enormous scientific challenge."\nRead more about developments in artificial intelligence in our special report, Intelligent Machines.',
   np.float32(0.47844988)),
  ('UK spies will need to use artificial intelligence (AI) to counter a range of threats, an intelligence report says.\nAdversaries are likely to use the technology for attacks in cyberspace and on the political system, and AI will be needed to detect and stop them.\nBut AI is unlikely to predict who might be about to be involved in serious crimes, such as terrorism - and will not replace human judgement, it says.\nThe report is based on unprecedented access to British intelligence.\nThe Royal United Services Institute (Rusi) think tank also argues that the use of AI could give rise to new privacy and human-rights considerations, which will require new guidance.\nThe UK\'s adversaries "will undoubtedly seek to use AI to attack the UK", Rusi says in the report - and this may include not just states, but also criminals.\nFire with fire\nThe future threats could include using AI to develop deep fakes - where a computer can learn to generate convincing faked video of a real person - in order to manipulate public opinion and elections.\nIt might also be used to mutate malware for cyber-attacks, making it harder for normal systems to detect - or even to repurpose and control drones to carry out attacks.\nIn these cases, AI will be needed to counter AI, the report argues.\n"Adoption of AI is not just important to help intelligence agencies manage the technical challenge of information overload. It is highly likely that malicious actors will use AI to attack the UK in numerous ways, and the intelligence community will need to develop new AI-based defence measures," argues Alexander Babuta, one of the authors.\nThe independent report was commissioned by the UK\'s GCHQ security service, and had access to much of the country\'s intelligence community.\nAll three of the UK\'s intelligence agencies have made the use of technology and data a priority for the future - and the new head of MI5, Ken McCallum, who takes over this week, has said one of his priorities will be to make greater use of technology, including machine learning.\nHowever, the authors believe that AI will be of only "limited value" in "predictive intelligence" in fields such as counter-terrorism.\nThe often-cited fictional reference is the film Minority Report where technology is used to predict those on the path to commit a crime before they have carried it out.\nBut the report argues this is less likely to be viable in real-life national security situations.\nActs such as terrorism are too infrequent to provide sufficiently large historical datasets to look for patterns - they happen far less often than other criminal acts, such as burglary.\nEven within that data set, the background and ideologies of the perpetrators vary so much that it is hard to build a model of a terrorist profile. There are too many variables to make prediction straightforward, with new events potentially being radically different from previous ones, the report argues.\nAny kind of profiling could also be discriminatory and lead to new human-rights concerns.\nIn practice, in fields like counter-terrorism, the report argues that "augmented" - rather than artificial - intelligence will be the norm - where technology helps human analysts sift through and prioritise increasingly large amounts of data, allowing humans to make their own judgements.\nIt will be essential to ensure human operators remain accountable for decisions and that AI does not act as a "black box", from which people do not understand the basis on which decisions are made, the report says.\nBit by bit\nThe authors are also wary of some of the hype around AI, and of talk that it will soon be transformative.\nInstead, they believe we will see the incremental augmentation of existing processes rather than the arrival of novel futuristic capabilities.\nThey believe the UK is in a strong position globally to take a lead, with a concentration of capability in GCHQ - and more widely in the private sector, and in bodies like the Alan Turing Institute and the Centre for Data Ethics and Innovation.\nThis has the potential to allow the UK to position itself at the leading edge of AI use but within a clear framework of ethics, they say.\nThe deployment of AI by intelligence agencies may require new guidance to ensure safeguards are in place and that any intrusion into privacy is necessary and proportionate, the report says.\nRead more from Gordon:\nOne of the thorny legal and ethical questions for spy agencies, especially since the Edward Snowden revelations, is how justifiable it is to collect large amounts of data from ordinary people in order to sift it and analyse it to look for those who might be involved in terrorism or other criminal activity.\nAnd there\'s the related question of how far privacy is violated when data is collected and analysed by a machine versus when a human sees it.\nPrivacy advocates fear that artificial intelligence will require collecting and analysing far larger amounts of data from ordinary people, in order to understand and search for patterns, that create a new level of intrusion. The authors of the report believe new rules will be needed.\nBut overall, they say it will be important not to become over-occupied with the potential downsides of the use of technology.\n"There is a risk of stifling innovation if we become overly-focused on hypothetical worst-case outcomes and speculations over some dystopian future AI-driven surveillance network," argues Mr Babuta.\n"Legitimate ethical concerns will be overshadowed unless we focus on likely and realistic uses of AI in the short-to-medium term."',
   np.float32(0.48490906)),
  ('The last decade was a big one for artificial intelligence but researchers in the field believe that the industry is about to enter a new phase.\nHype surrounding AI has peaked and troughed over the years as the abilities of the technology get overestimated and then re-evaluated.\nThe peaks are known as AI summers, and the troughs AI winters.\nThe 10s were arguably the hottest AI summer on record with tech giants repeatedly touting AI\'s abilities.\nAI pioneer Yoshua Bengio, sometimes called one of the "godfathers of AI", told the BBC that AI\'s abilities were somewhat overhyped in the 10s by certain companies with an interest in doing so.\nThere are signs, however, that the hype might be about to start cooling off.\n"I have the sense that AI is transitioning to a new phase," said Katja Hofmann, a principal researcher at Microsoft Research in Cambridge.\nGiven the billions being invested in AI and the fact that there are likely to be more breakthroughs ahead, some researchers believe it would be wrong to call this new phase an AI winter.\nRobot Wars judge Noel Sharkey, who is also a professor of AI and robotics at Sheffield University, told the BBC that he likes the term "AI autumn" - and several others agree.\n\'Feeling of plateau\'\nAt the start of the 2010s, one of the world leaders in AI, DeepMind, often referred to something called AGI, or "artificial general intelligence" being developed at some point in the future.\nMachines that possess AGI - widely thought of as the holy grail in AI - would be just as smart as humans across the board, it promised.\nDeepMind\'s lofty AGI ambitions caught the attention of Google, who paid around £400m for the London-based AI lab in 2014 when it had the following mission statement splashed across its website: "Solve intelligence, and then use that to solve everything else."\nSeveral others started to talk about AGI becoming a reality, including Elon Musk\'s $1bn AI lab, OpenAI, and academics like MIT professor Max Tegmark.\nIn 2014, Nick Bostrom, a philosopher at Oxford University, went one step further with his book Superintelligence. It predicts a world where machines are firmly in control.\nBut those conversations were taken less and less seriously as the decade went on. At the end of 2019, the smartest computers could still only excel at a "narrow" selection of tasks.\nGary Marcus, an AI researcher at New York University, said: "By the end of the decade there was a growing realisation that current techniques can only carry us so far."\nHe thinks the industry needs some "real innovation" to go further.\n"There is a general feeling of plateau," said Verena Rieser, a professor in conversational AI at Edinburgh\'s Heriot Watt University.\nOne AI researcher who wishes to remain anonymous said we\'re entering a period where we are especially sceptical about AGI.\n"The public perception of AI is increasingly dark: the public believes AI is a sinister technology," they said.\nFor its part, DeepMind has a more optimistic view of AI\'s potential, suggesting that as yet "we\'re only just scratching the surface of what might be possible".\n"As the community solves and discovers more, further challenging problems open up," explained Koray Kavukcuoglu, its vice president of research.\n"This is why AI is a long-term scientific research journey.\n"We believe AI will be one of the most powerful enabling technologies ever created - a single invention that could unlock solutions to thousands of problems. The next decade will see renewed efforts to generalise the capabilities of AI systems to help achieve that potential - both building on methods that have already been successful and researching how to build general-purpose AI that can tackle a wide range of tasks."\n\'Far to go\'\nWhile AGI isn\'t going to be created any time soon, machines have learned how to master complex tasks like:\n- playing the ancient Chinese board game Go\n- identifying human faces\n- translating text into practically every language\n- spotting tumours\n- driving cars\n- identifying animals.\nThe relevance of these advances was overhyped at times, says ex-DeepMinder Edward Grefenstette, who now works in the Facebook AI Research group as a research scientist.\n"The field has come a very long way in the past decade, but we are very much aware that we still have far to go in scientific and technological advances to make machines truly intelligent," he said.\n"One of the biggest challenges is to develop methods that are much more efficient in terms of the data and compute power required to learn to solve a problem well. In the past decade, we\'ve seen impressive advances made by increasing the scale of data and computation available, but that\'s not appropriate or scalable for every problem.\n"If we want to scale to more complex behaviour, we need to do better with less data, and we need to generalise more."\nNeil Lawrence, who recently left Amazon and joined the University of Cambridge as the first DeepMind-funded professor of machine learning, thinks that the AI industry is very much still in the "wonder years".\nSo what will AI look like at the end of the 20s, and how will researchers go about developing it?\n"In the next decade, I hope we\'ll see a more measured, realistic view of AI\'s capability, rather than the hype we\'ve seen so far," said Catherine Breslin, an ex-Amazon AI researcher.\nThe term "AI" became a real buzzword through the last decade, with companies of all shapes and sizes latching onto the term, often for marketing purposes.\n"The manifold of things which were lumped into the term "AI" will be recognised and discussed separately," said Samim Winiger, a former AI researcher at Google in Berlin.\n"What we called \'AI\' or \'machine learning\' during the past 10-20 years, will be seen as just yet another form of \'computation\'".',
   np.float32(0.50347656)),
  ('Singularity: The robots are coming to steal our jobs\n- 13 January 2014\n- From the section Technology\nIf you worry that the robots are coming, don\'t, because they are already here.\nArtificial intelligence agents are already involved in every aspect of our lives - they keep our inboxes free of spam, they help us make our web transactions, they fly our planes and if Google gets its way will also soon drive our cars for us.\n"AI\'s are embedded in the fabric of our everyday lives," head of AI at Singularity University, Neil Jacobstein, told the BBC.\n"They are used in medicine, in law, in design and throughout automotive industry."\nAnd each day the algorithms that power away, making decisions behind the scenes, are getting smarter.\nIt means that one of the biggest quests of the modern world - the search to make machines as intelligent as humans - could be getting tantalisingly close.\nMr Jacobstein predicts that artificial intelligence will overtake human intelligence in the mid-2020s, begging the question - what will a society dominated by machine intelligence look like and what exactly will be our role in it?\nWe may get to put our feet up more, for a start.\nChinese company Hon Hai, the world\'s largest contract electronics manufacturer, has announced it intends to build a robot-making factory and replace 500,000 workers with robots over the next three years.\nBut not having a job will also mean not having a wage, a radical change for a world used to working for a living.\n"AIs will cause significant unemployment but that doesn\'t equate with poverty," said Mr Jacobstein.\n"AIs and other exponential technologies are going to generate vast amounts of wealth.\n"We have to be willing to change the social contract we have with people about how wealth is distributed."\nHe tends towards the optimistic view of machines and humans working in perfect harmony, side by side.\n"The best combination for problem solving is a human and a computer," he said.\nAuthor and documentary-maker James Barrat sits in a very different camp. He is so worried about the onslaught of artificial intelligence that he has written a book about it.\nOur Final Invention examines whether the increasing domination of artificial intelligence is going to mean the end of the human era.\n"Advanced AI is a dual-use technology, like nuclear fission. Fission can illuminate cities or incinerate them. At advanced levels, AI will be even more volatile and dangerous than fission, and it\'s already being weaponised in autonomous drones and battlefield robots," Barrat told the BBC.\n"More than any other science it forces us to probe ourselves - what are these things we call intelligence, conscience, emotion? But in looking inward we better see our own predilection for irrational violence and technological recklessness. Our innovation always runs far ahead of our stewardship," he said.\nThe robot revolution may be some way off if a competition organised by the Pentagon\'s research unit Darpa in December is anything to go by.\nVideos posted online showed the robots remained much slower than humans, often unsteady on their feet with some failing to complete any of the challenges.\nNonetheless there is a buzz around robots and artificial intelligence at the moment. Google has just bought eight robotic firms, while Facebook has its very own AI lab.\nSpeculation is rife about what Google will do with its new acquisition.\nGoogle robots could be very powerful, thinks Mr Barrat.\n"That\'s one route to human level intelligence. A high quality personal assistant wouldn\'t be just a smartphone - it\'d have a humanoid body. Why humanoid? So it can drive your car, use your tools, bounce the baby, act as your bodyguard if need be," he said.\nIf the rise of the robots is inevitable - albeit a few years off - then it is also a logical step that humans will eventually be eliminated from the decision chain entirely, meaning AIs will be controlling other AIs.\nThat was already happening in our laptops and computers, said Mr Jacobstein.\n"Anti-virus software is basically AI techniques that is being used to detect other AIs that we call viruses and worms," he said.\nBut he acknowledges that controls to make sure that the phrase "robot failure" doesn\'t replace "human failure" would have to be built into future AI systems.\n"We would build the same layered control system we need in everyday life with humans. We want to look at the risks and build controls that stops that rogue behaviour," he said.\nWhile Mr Jacobstein remains sanguine about the robot takeover, he is well aware that many see it as the stuff of nightmares.\n"Some people ask, \'How do you sleep at night knowing the prospects for artificial intelligence?\' but it isn\'t artificial intelligence that keeps me awake at night, it is human stupidity," he said.\nFor him, the only way that humans will keep up with the robots is to become more like them.\n"Our brains haven\'t had a major upgrade for 50,000 years and if your laptop or smartphone hadn\'t had an upgrade in five years you might be concerned about that," he said.\nAlready we have access to AI\'s such as Siri and Google Now and are pretty much constantly connected to the web via our smartphones, so it isn\'t so much of a step to imagine a future where silicon is embedded in our skulls.\nAnd it could be the only way for us to keep up with the robots.',
   np.float32(0.5062896))]]

Creating a web app and microservice for retrieval

We will be using Gradio as web application tool to create a demo interface for our vector search index. We can develop this locally and then easily deploy it to Hugging Face Spaces. Lastly, we can use the Gradio client as SDK to directly interact with our vector search index.

Creating the web app

def search(query, k):
    return similarity_search_with_duckdb_index(query, k)

with gr.Blocks() as demo:
    gr.Markdown("""# RAG - retrieve
                
                Part of [AI blueprint](https://github.com/huggingface/ai-blueprint) - a blueprint for AI development, focusing on practical examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs. """)
    query = gr.Textbox(label="Query")
    k = gr.Slider(1, 10, value=5, label="Number of results")
    btn = gr.Button("Search")
    results = gr.Dataframe(headers=["title", "url", "content", "distance"])
    btn.click(fn=search, inputs=[query, k], outputs=[results])

demo.launch(share=False)  # set to True to share as a public website directly

* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.

Deploying the web app on Hugging Face

We can now deploy our Gradio application to Hugging Face Spaces.

Click on the "Create Space" button.
Copy the code from the Gradio interface and paste it into an app.py file. Don't forget to copy the similarity_search_* function, along with the code to create the index.
Create a requirements.txt file with duckdb, sentence-transformers and model2vec.

We wait a couple of minutes for the application to deploy et voila, we have a public vector search interface!

Using the web app as a microservice

We can now use the Gradio client as SDK to directly interact with our vector search index. Each Gradio app has a API documentation that describes the available endpoints and their parameters, which you can access from the button at the bottom of the Gradio app's space page.

client = Client("https://ai-blueprint-rag-retrieve.hf.space/")
results = client.predict(
    api_name="/similarity_search", query="How should companies prepare for AI?", k=5
)
pd.DataFrame(data=results["data"], columns=results["headers"])

Loaded as API: https://ai-blueprint-rag-retrieve.hf.space/ ✔

	chunk	url	distance
0	"We have to prepare for a different future. ".	http://news.bbc.co.uk/2/hi/europe/3602209.stm	0.444404
1	UK spies will need to use artificial intellige...	https://www.bbc.com/news/technology-52415775	0.446492
2	Google developing kill switch for AI\n- 8 June...	http://www.bbc.com/news/technology-36472140	0.471058
3	Artificial intelligence (AI) is one of the mos...	https://www.bbc.co.uk/news/business-48139212	0.471088
4	The last decade was a big one for artificial i...	https://www.bbc.com/news/technology-51064369	0.472657

Conclusion

We have shown a basic approach on how to index and perform vector search on the Hugging Face Hub. Next, we will build a reranker that uses the output of the vector search to improve the quality of the retrieved documents by reranking them based on their relevance to the query.

Next Steps

[https://github.com/huggingface/ai-blueprint] - code to show how to use chunking, ColBERT, multi-modal, RAGatouille, Byaldi, or CLIP.
Learn - theories behind the approaches in Hugging Face courses or smol-course.
Explore - notebooks with similar techniques on the Hugging Face Cookbook.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote