From Open Information Extraction to Probabilistic Fusion: Semantic Retrieval Pipelines for Enterprise Knowledge Graph Construction

Authors

  • Sriram Ghanta Senior Java Full Stack Developer, United States of America Author

DOI:

https://doi.org/10.15662/IJRAI.2025.0802010

Keywords:

Semantic Retrieval, Knowledge Graph Construction, Open Information Extraction, Enterprise Knowledge Graphs, Knowledge Fusion, Information Extraction Pipelines, Semantic Search

Abstract

The exponential growth of unstructured enterprise data spanning documents, logs, emails, reports, and web content has intensified the demand for scalable mechanisms capable of extracting, organizing, and retrieving knowledge in machine-interpretable forms. Knowledge Graphs (KGs) have consequently emerged as a foundational representation for modeling entities, relationships, and contextual semantics across heterogeneous and distributed information sources, enabling more advanced analytics, reasoning, and decision support. This article presents a systematic exploration of semantic retrieval pipelines for enterprise knowledge graph construction, tracing their evolution from early Open Information Extraction (OpenIE) systems to more sophisticated probabilistic knowledge fusion architectures. Drawing upon seminal systems such as TextRunner and SigmaKB, we analyze how successive pipeline stages including large-scale text ingestion, relation extraction, semantic filtering, entity normalization and disambiguation, and probabilistic knowledge fusion work in concert to transform noisy, unstructured data into coherent and reliable enterprise knowledge graphs. The discussion synthesizes recurring architectural patterns observed across foundational systems, examines practical challenges encountered in large-scale enterprise deployments such as noise management, ambiguity resolution, scalability, and trust and highlights emerging directions toward AI-augmented semantic retrieval, where machine learning and neural representations increasingly complement symbolic knowledge representations to enhance robustness, adaptability, and semantic depth

References

1. Banko, M. (2009). Open information extraction for the web. https://turing.cs.washington.edu/papers/banko-thesis.pdf

2. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the web. https://www.ijcai.org/Proceedings/07/Papers/429.pdf

3. Etzioni, O., Banko, M., Soderland, S., & Weld, D. S. (2008). Open information extraction from the web. https://doi.org/10.1145/1409360.1409378

4. Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data.

https://aclanthology.org/P09-1113.pdf

5. Nanchari, N. (2020). Remote Patient Monitoring in Healthcare: Leveraging Iot for Continuous Care. https://doi.org/10.5281/zenodo.15791053

6. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E. R., & Mitchell, T. M. (2010). Coupled semi-supervised learning for information extraction. https://dl.acm.org/doi/10.1145/1718487.1718501

7. Nanchari, N. (2020). Iot In Healthcare: A Review Of Technological Interventions And Implementation Models. https://doi.org/10.5281/zenodo.15795982

8. Garlan, D., Cheng, S. W., Huang, A. C., Schmerl, B., & Steenkiste, P. (2004).

Rainbow: Architecture-based self-adaptation with reusable infrastructure. https://doi.org/10.1109/MC.2004.175

9. Mao, H., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource management with deep reinforcement learning. https://doi.org/10.1145/3005745.3005750

10. Rodriguez, M., Posse, C., & Zhang, E. (2016). Multiple probabilistic knowledge base fusion. https://www.vldb.org/pvldb/vol9/p1577-rodriguez.pdf

11. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). YAGO: A core of semantic knowledge. https://doi.org/10.1145/1242572.1242667

12. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. https://doi.org/10.1007/978-3-540-76298-0_52

13. Hoffart, J., Suchanek, F. M., Berberich, K., & Weikum, G. (2011). Robust disambiguation of named entities in text. https://aclanthology.org/D11-1072.pdf

14. Srikanth Chakravarthy Vankayala. (2016). Reframing Enterprise Quality Engineering: The Emergence of Predictive and Cognitive Automation. https://doi.org/10.5281/zenodo.17839512

Downloads

Published

2023-05-18

How to Cite

From Open Information Extraction to Probabilistic Fusion: Semantic Retrieval Pipelines for Enterprise Knowledge Graph Construction. (2023). International Journal of Research and Applied Innovations, 6(3), 8933-8940. https://doi.org/10.15662/IJRAI.2025.0802010