Adamou & D’Aquin: "Linked Data Principles for Data Lakes" 145–169 (PDF, 2020)

Shared by Stian Danenbarger, like and 1 save total

LCM Team et al: "Large Concept Models: Language Modeling in a Sentence Representation Space" (PDF, 2024)

Shared by Stian Danenbarger, like and 2 saves total

Fungwacharakorn et al: "Layer-of-Thoughts Prompting (LoT): Leveraging LLM-Based Retrieval with Constraint Hierarchies" (PDF, 2025)

Shared by Stian Danenbarger, like and 1 save total

Radakrishnan et al: "Knowing When to Ask - Bridging Large Language Models and Data" (PDF, 2024)

"Large Language Models (LLMs) are prone to generating factually incorrect information when responding to queries that involve numerical and statistical data or other timely facts. In this paper, we present an approach for enhancing the accuracy of LLMs by integrating them with Data Commons, a vast, open-source repository of public statistics from trusted organizations like the United Nations (UN), Center for Disease Control and Prevention (CDC) and global census bureaus. We explore two primary methods: Retrieval Interleaved Generation (RIG), where the LLM is trained to produce natural language queries to retrieve data from Data Commons, and Retrieval Augmented Generation (RAG), where relevant data tables are fetched from Data Commons and used to augment the LLM's prompt. We evaluate these methods on a diverse set of queries, demonstrating their effectiveness in improving the factual accuracy of LLM outputs. Our work represents an early step towards building more trustworthy and reliable LLMs that are grounded in verifiable statistical data and capable of complex factual reasoning"

Shared by Stian Danenbarger, like and 1 save total

Ye et al: "Effective Large Language Model Adaptation for Improved Grounding and Citation Generation" (PDF, 2024)

"Large language models (LLMs) have achieved remarkable advancements in natural language understanding and generation. However, one major issue towards their widespread deployment in the real world is that they can generate "hallucinated" answers that are not factual. Towards this end, this paper focuses on improving LLMs by grounding their responses in retrieved passages and by providing citations. We propose a new framework, AGREE, Adaptation for GRounding EnhancEment, that improves the grounding from a holistic perspective. Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents. This tuning on top of the pre-trained LLMs requires well-grounded responses (with citations) for paired queries, for which we introduce a method that can automatically construct such data from unlabeled queries. The selfgrounding capability of tuned LLMs further grants them a test-time adaptation (TTA) capability that can actively retrieve passages to support the claims that have not been grounded, which iteratively improves the responses of LLMs. Across five datasets and two LLMs, our results show that the proposed tuningbased AGREE framework generates superior grounded responses with more accurate citations compared to prompting-based approaches and post-hoc citing-based approaches"

Shared by Stian Danenbarger, like and 1 save total

Liu et al: "Gradformer: Graph Transformer with Exponential Decay" (PDF, 2024)

"Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details"

Shared by Stian Danenbarger, like and 1 save total

Neelam et al: "Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases" (PDF, 2022))

"Resource Description Framework (RDF) and Property Graph (PG) are
the two most commonly used data models for representing, storing, and querying
graph data. We present Expressive Reasoning Graph Store (ERGS) – a graph
store built on top of JanusGraph (a Property Graph store) that also allows storing
and querying of RDF datasets. First, we describe how RDF data can be translated
into a Property Graph representation and then describe a query translation module
that converts SPARQL queries into a series of Gremlin traversals. The converters
and translators thus developed can allow any Apache Tinkerpop compliant graph
database to store and query RDF datasets. We demonstrate the effectiveness of
our proposed approach using JanusGraph as the base Property Graph store and
compare its performance with standard RDF systems"

storing and querying RDF graphs on a Property Graph store
build upon and extend the formal framework by Hartig [10] for transforming RDF to Property Graphs and develop a query translator to convert SPARQL queries into their equivalent Gremlin traversals
he first complete solution to enable storage and querying of RDF datasets in Property Graph stores
A predicate can either be mapped as a node property or as an edge to subject node of the triple
Hartig [10], is a space-efficient and optimized data transformation method that utilizes the object type of the triples to reduce the number of nodes and edges in the resultant graph
In the simplest form of transformation, each RDF triple tfs,p,og can be uniformly converted into two nodes, one each for s and o with IRI as property, and an edge with label p
literal objects are stored as properties of the subject nodes for the corresponding triples
Other associated information of the literal (such as datatype and language information) are stored as meta-property, i.e, properties of the property
<subject> <predicate> <referent> iri=subject iri=referent predicate <subject> <predicate> "literal"@en iri=subject predicate="literal"{language:"en"} Fig. 2. Triples Transform Strategy
Metadata is stored as subgraph in same graph, but it is hidden from user
the metadata subgraph contains one node for each distinct predicate value in the RDF input and stores the type of mapping used

9 more annotations...

Borges et al: "Ontology-Driven Multi-Level Conceptual Modeling for Dataset and Distributions Descriptions" (PDF, 2023)

"This paper presents an improvement proposal for an ontology-driven multi-level conceptual model for the data catalogue domain. Data catalogues gather metadata that describe resources in different and heterogeneous digital platforms (repositories). They are supported by Information Systems (IS) that use these descriptors to provide visibility and support resources exploration and analysis. Domain ontologies are essential to promote quality ISs, as they are developed to reflect the intended reality. The proposed conceptual model is well-founded on the Unified Foundational Ontology and the Multi-Level Theory, based on the widely used DCAT vocabulary, a standardized metadata schema for describing datasets and data services. The resulting model addresses ambiguities and contemplates high-level types contributing to the conformance of domain concepts and relationships. In addition, they provide knowledge about the different types of resource descriptors and relationships contained in a specific catalogue, favoring its management. The paper enhances the previous model by extending it to handle descriptors representing a dataset according to the data equivalence across multiple distributions. We also demonstrate the model by describing a dataset with no data equivalence in its distributions, taken from a real-world scenario, thus providing a structured representation to manage metadata sets in the data catalogue domain"

Shared by Stian Danenbarger, like and 1 save total

Fernando et al: "Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution" (PDF, 2024)

Shared by Stian Danenbarger, like and 1 save total

Robkin et al: "Levels of Conceptual Interoperability Model for Healthcare: Framework for Safe Medical Device Interoperability" (PDF, 2015))

"The Medical Device and Healthcare
Information Technology (HIT) industries have not
achieved safe PNP cross-manufacturer (heterogeneous)
interoperability although it has been achieved decades
ago in other safety critical industries. We believe that the
Levels of Conceptual Interoperability Model (LCIM) [1]
offers an essential account of the disparity and thereby
offers insight for how to achieve safe PNP cross-
manufacturer interoperability in HIT. The LCIM is a
conceptual framework for interoperability first
developed for military simulation and modeling. We
have expanded its scope and detail while applying it to
medical devices. Our results show that safe
interoperability minimally requires system components
that are aligned about a conceptual model (i.e.
manufacturers are operating at level 6). Furthermore,
such devices can be assured to be safely interoperable
cross-manufacturer only if different manufacturers
share the conceptual model embodied by the
communicating devices. We identify some root causes
preventing this realization"

Shared by Stian Danenbarger, like and 1 save total