Adamou & D’Aquin: "Linked Data Principles for Data Lakes" 145–169 (PDF, 2020)
Shared by Stian Danenbarger, 1 save total
"Large Language Models (LLMs) are prone to generating factually incorrect information when responding to queries that involve numerical and statistical data or other timely facts. In this paper, we present an approach for enhancing the accuracy of LLMs by integrating them with Data Commons, a vast, open-source repository of public statistics from trusted organizations like the United Nations (UN), Center for Disease Control and Prevention (CDC) and global census bureaus. We explore two primary methods: Retrieval Interleaved Generation (RIG), where the LLM is trained to produce natural language queries to retrieve data from Data Commons, and Retrieval Augmented Generation (RAG), where relevant data tables are fetched from Data Commons and used to augment the LLM's prompt. We evaluate these methods on a diverse set of queries, demonstrating their effectiveness in improving the factual accuracy of LLM outputs. Our work represents an early step towards building more trustworthy and reliable LLMs that are grounded in verifiable statistical data and capable of complex factual reasoning"
Shared by Stian Danenbarger, 1 save total
"Large language models (LLMs) have achieved remarkable advancements in natural language understanding and generation. However, one major issue towards their widespread deployment in the real world is that they can generate "hallucinated" answers that are not factual. Towards this end, this paper focuses on improving LLMs by grounding their responses in retrieved passages and by providing citations. We propose a new framework, AGREE, Adaptation for GRounding EnhancEment, that improves the grounding from a holistic perspective. Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents. This tuning on top of the pre-trained LLMs requires well-grounded responses (with citations) for paired queries, for which we introduce a method that can automatically construct such data from unlabeled queries. The selfgrounding capability of tuned LLMs further grants them a test-time adaptation (TTA) capability that can actively retrieve passages to support the claims that have not been grounded, which iteratively improves the responses of LLMs. Across five datasets and two LLMs, our results show that the proposed tuningbased AGREE framework generates superior grounded responses with more accurate citations compared to prompting-based approaches and post-hoc citing-based approaches"
Shared by Stian Danenbarger, 1 save total
"Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details"
Shared by Stian Danenbarger, 1 save total
"Resource Description Framework (RDF) and Property Graph (PG) are
the two most commonly used data models for representing, storing, and querying
graph data. We present Expressive Reasoning Graph Store (ERGS) – a graph
store built on top of JanusGraph (a Property Graph store) that also allows storing
and querying of RDF datasets. First, we describe how RDF data can be translated
into a Property Graph representation and then describe a query translation module
that converts SPARQL queries into a series of Gremlin traversals. The converters
and translators thus developed can allow any Apache Tinkerpop compliant graph
database to store and query RDF datasets. We demonstrate the effectiveness of
our proposed approach using JanusGraph as the base Property Graph store and
compare its performance with standard RDF systems"
"This paper presents an improvement proposal for an ontology-driven multi-level conceptual model for the data catalogue domain. Data catalogues gather metadata that describe resources in different and heterogeneous digital platforms (repositories). They are supported by Information Systems (IS) that use these descriptors to provide visibility and support resources exploration and analysis. Domain ontologies are essential to promote quality ISs, as they are developed to reflect the intended reality. The proposed conceptual model is well-founded on the Unified Foundational Ontology and the Multi-Level Theory, based on the widely used DCAT vocabulary, a standardized metadata schema for describing datasets and data services. The resulting model addresses ambiguities and contemplates high-level types contributing to the conformance of domain concepts and relationships. In addition, they provide knowledge about the different types of resource descriptors and relationships contained in a specific catalogue, favoring its management. The paper enhances the previous model by extending it to handle descriptors representing a dataset according to the data equivalence across multiple distributions. We also demonstrate the model by describing a dataset with no data equivalence in its distributions, taken from a real-world scenario, thus providing a structured representation to manage metadata sets in the data catalogue domain"
Shared by Stian Danenbarger, 1 save total
"The Medical Device and Healthcare
Information Technology (HIT) industries have not
achieved safe PNP cross-manufacturer (heterogeneous)
interoperability although it has been achieved decades
ago in other safety critical industries. We believe that the
Levels of Conceptual Interoperability Model (LCIM) [1]
offers an essential account of the disparity and thereby
offers insight for how to achieve safe PNP cross-
manufacturer interoperability in HIT. The LCIM is a
conceptual framework for interoperability first
developed for military simulation and modeling. We
have expanded its scope and detail while applying it to
medical devices. Our results show that safe
interoperability minimally requires system components
that are aligned about a conceptual model (i.e.
manufacturers are operating at level 6). Furthermore,
such devices can be assured to be safely interoperable
cross-manufacturer only if different manufacturers
share the conceptual model embodied by the
communicating devices. We identify some root causes
preventing this realization"
Shared by Stian Danenbarger, 1 save total
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo