Exact-word bias
Keyword matching is literal. It can miss related work expressed with different terminology, regional phrasing or discipline-specific language.
Scholarly knowledge is published in many languages. Yet relevant work can remain invisible when discovery depends on exact words. This guided pilot compares keyword, semantic and hybrid retrieval over a LA Referencia subset.
The barrier is not research quality. It is visibility: useful work can be missed simply because the query and the record use different words or languages.
Keyword matching is literal. It can miss related work expressed with different terminology, regional phrasing or discipline-specific language.
Equivalent ideas become disconnected when a search only retrieves records that share the language and wording of the original query.
When discovery tools favour dominant languages, relevant scholarship in other languages becomes harder for global audiences to find and use.
Semantic multilingual search compares meaning across languages and phrasing. It complements keyword search: it does not translate documents, and proximity of meaning does not guarantee relevance.
Finds literal terms in indexed metadata. It is useful for precise known terms, especially when the query and the record share vocabulary and language.
Retrieves nearby meanings within a shared multilingual space, even when the query and the record use different languages or phrasing.
Combines literal and semantic rankings. It can preserve exact-term precision while adding meaning-based discovery.
This is a pilot over an indexed subset, not an evaluation of the complete LA Referencia network. Deposita is used as a thematic control set: it confirms that each selected topic is represented before multilingual queries are tested without repository filters.
Each topic starts with an original Portuguese phrase verified in Deposita. Every multilingual card opens the unfiltered three-column comparison over the pilot subset.
Semantic search can also reduce exact-word bias within one language. These simple alternatives use the same editorial review as the multilingual examples.
Editorial review makes the comparison auditable without presenting an exploratory prototype as a comprehensive benchmark.
Deposita selects the topics; it does not filter the public demonstration links.
This guided demo is based on work by the LA Referencia team within the project Expansion and innovation of the federated Open Science infrastructure in Latin America, funded by Invest in Open Infrastructure (IOI) ↗.
Its conceptual framing is informed by Enhancing Visibility Across Languages: Semantic Multilingual Search for Scholarly Content ↗, Lautaro Matas on behalf of COAR · Version 1.0 · 6 November 2025.