Top of page
Technology

EU Initiative to make web search more open and ethical

woman using computer

On 6 June, the OpenWebSearch.eu consortium released a pilot of a new infrastructure that aims to make European web search fairer, more transparent and commercially unbiased. With strong participation by CERN, the European Open Web Index (OWI) is now open for use by academic, commercial and independent teams under a general research licence, with commercial options in development on a case-by-case basis.

The OpenWebSearch.eu initiative was launched in 2022, with a consortium made up of 14 leading research institutions from across Europe, including CERN. The project aims to build a public web index that offers an alternative to existing indexes held by companies like Google (USA), Microsoft (USA), Baidu (China) and Yandex (Russia). Web indexes provide the back-end data infrastructure behind search engines, and today the companies that manage them determine what content is searchable and how it is ranked. Currently, Europe does not have a search index of its own, making it vulnerable to digital dependence.

The OWI offers a clear alternative based on European values. The project’s cross-disciplinary nature, ensuring continuous dialogue between technical teams and legal, ethical and social experts, ensures that fairness and privacy are built into the OWI from the start. “Over thirty years since the World Wide Web was created at CERN and released to the public, our commitment to openness continues,” says Noor Afshan Fathima, IT research fellow at CERN. “Search is the next logical step in democratising digital access, especially as we enter the AI era.” The OWI facilitates AI capabilities, allowing web search data to be used for training large language models (LLMs), generating embeddings and powering chatbots.

The CERN team has built key parts of the infrastructure that power the OWI’s crawling and indexing capabilities. This means that it tracks which webpages should be scanned. The system handles about 9 million URLs per hour, which equates to roughly 3 terabytes of public web data a day, with the aim of indexing 30–50% of the text-based web by the end of 2025. “We have already hit our target of indexing one petabyte of openly licensed web data, and our public dashboard helps users monitor that progress,” says Noor.

CERN is also contributing to other parts of the project. For example, it is scanning its own public physics content to enhance the OWI, as well as developing an internal index and its own search tools and services. Currently, a prototype of a use case for the OWI is in development: known as “Nooon”, this research-driven search engine is dedicated to people with disabilities who require search engines that surface structured, accessible and representative information while ensuring privacy in both access and contribution.

The release of the OWI, which has received funding from the European Union’s Horizon research and innovation programme, comes at a pivotal time. The European Commission’s Invest AI initiative is set to mobilise 200 billion euros for artificial intelligence, and the OWI offers a powerful foundation of open data for innovation. Furthermore, as Microsoft plans to retire access to the Bing index, the OWI will be able to offer an alternative index for European search engines.

After two and a half years of intensive research and development, anybody can now request access to the OWI by signing up at openwebindex.eu/auth/login. Note that the project provides a web index, and not a search engine or API, and users wishing to build their own search engines or chatbots will need a working knowledge of how to apply web index data.
https://home.cern/news/news/computing/european-project-make-web-search-more-open-and-ethical?utm_source=miragenews&utm_medium=miragenews&utm_campaign=news

You might also like

A young girl sits in a park, working on her laptop, surrounded by nature. Her colorful outfit and appearance reflect her energy and optimism. With a prosthetic leg, she proves that disability is no barrier to connecting with nature and embracing digitalization. This photo captures the harmony of technology, youth, and the outdoors—a powerful image of resilience and progress. A young girl sits in a park, working on her laptop, surrounded by nature. Her colorful outfit and appearance reflect her energy and optimism. With a prosthetic leg, she proves that disability is no barrier to connecting with nature and embracing digitalization. This photo captures the harmony of technology, youth, and the outdoors—a powerful image of resilience and progress.

Azerbaijan’s digital shift in disability assessments and payments

For years, people with disabilities in Azerbaijan carried a double…

Group Photo Group Photo

WHO launches training to expand access to assistive technology in Azerbaijan

The World Health Organization (WHO) Country Office in Azerbaijan, together…

Report calls for urgent action on neurotechnology and human rights

The Australian Human Rights Commission has released its report, ‘Peace…

Doctors performing brain surgery at TUM University Hospital Doctors performing brain surgery at TUM University Hospital

Brain-computer interface for a patient with quadriplegia

A team at the Technical University of Munich’s TUM University…