This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
ACM/IEEE Joint Conference on Digital Libraries 2017
University of Toronto
JCDL 2017 | #JCDL@2017
View analytic
Tuesday, June 20 • 14:00 - 15:30
Paper Session 02: Semantics and Linking

Sign up or log in to save this to your schedule and see who's attending!

Pavlos Fafalios, Helge Holzmann, Vaibhav Kasturia and Wolfgang Nejdl. Building and Querying Semantic Layers for Web Archives (Full)

*VB Best Paper Award Nominee

Web archiving is the process of collecting portions of the Web to ensure the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into a usable and useful information source. In this paper, we elaborate on this problem and propose an RDF/S model and a distributed framework for building semantic profiles ("layers") that describe semantic information about the contents of web archives. A semantic layer allows describing metadata information about the archived documents, annotating them with useful semantic information (like entities, concepts and events), and publishing all this data on the Web as Linked Data. Such structured repositories offer advanced query and integration capabilities and make web archives directly exploitable by other systems and tools. To demonstrate their query capabilities, we build and query semantic layers for three different types of web archives. An experimental evaluation showed that a semantic layer can answer information needs that existing keyword-based systems are not able to sufficiently satisfy.

Abhik Jana, Sruthi Mooriyath, Animesh Mukherjee and Pawan Goyal. WikiM: Metapaths based Wikification of Scientific Abstracts (Full)
In order to disseminate the exponential extent of knowledge being produced in the form of scientific publications, it would be best to design mechanisms that connect it with already existing rich repository of concepts -- the Wikipedia. Not only does it make scientific reading simple and easy (by connecting the involved concepts used in the scientific articles to their Wikipedia explanations) but also improves the overall quality of the article. In this paper, we present a novel metapath based method, WikiM, to efficiently wikify scientific abstracts -- a topic that has been rarely investigated in the literature. One of the prime motivations for this work comes from the observation that, wikified abstracts of scientific documents help a reader to decide better, in comparison to the plain abstracts, whether (s)he would be interested to read the full article. We perform mention extraction mostly through traditional tf-idf measures coupled with a set of smart filters. The entity linking heavily leverages on the rich citation and author publication networks. Our observation is that various metapaths defined over these networks can significantly enhance the overall performance of the system. For mention extraction and entity linking, we outperform most of the competing state-of-the-art techniques by a large margin arriving at precision values of 72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In order to establish the robustness of our scheme, we wikify three other datasets and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for the mention extraction and the entity linking phase.

Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang and C. Lee Giles. HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities (Short)
Automatic keyphrase extraction from scientific documents is a well-known problem. We investigate a variant of that problem: Scientific Domain Knowledge Entity (SDKE) extraction. Keyphrases are noun phrases that are important to the document. On the contrary, an SDKE is a span of text that refers to a concept and can be classified as a process, material, task, dataset etc. Supervised keyphrase extraction algorithms using non-sequential classifiers and global measures of informativeness (PMI, tf-idf) are good candidates for this task. Another approach is to use sequential labeling algorithms with local context from a sentence, as done in the named entity recognition tasks. We show that these methods can complement each other and a simple merging can improve the extraction accuracy by 5-7 percentiles. We further propose several heuristics to improve the extraction accuracy. Our preliminary experiments suggest that it is possible to improve the accuracy of the sequential learner itself by utilizing the predictions of the non-sequential model.

Xiao Yang, Dafang He, Wenyi Huang, Zihan Zhou, Alexander Ororbia, Daniel Kifer and C. Lee Giles. Smart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text  (Short)
Physical library collections are valuable and long standing resources for knowledge and learning. However, managing books in a large bookshelf and finding books on it often leads to tedious manual work, especially for large book collections where books might be missing or misplaced. Recently, deep neural-based models have achieved great success for scene text detection and recognition. Motivated by these recent successes, we aim to investigate their viability in facilitating book management, a task that introduces further challenges including large amounts of cluttered scene text, distortion, and varied lighting conditions. In this paper, we present a library inventory building and retrieval system based on scene text reading methods. We specifically design our scene text recognition model using rich supervision to accelerate training and achieve state-of-the-art performance on several benchmark datasets. Our proposed system has the potential to greatly reduce the amount of human labor required in managing book inventories as well as the space needed to store book information.


Faryaneh Poursardar

PhD Candidate, Texas A&M University
Web archive, HCI


Tuesday June 20, 2017 14:00 - 15:30
Room 325, Faculty of Information 140 St. George Street, Toronto, ON, M5S 3G6

Attendees (19)