Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
ACM/IEEE Joint Conference on Digital Libraries 2017
University of Toronto
JCDL 2017 | #JCDL@2017
View analytic
Wednesday, June 21 • 14:00 - 15:30
Paper Session 07: Collection Building

Sign up or log in to save this to your schedule and see who's attending!

Federico Nanni, Simone Paolo Ponzetto and Laura Dietz. Building Entity-Centric Event Collections (Full)

*Best Student Paper Award Nominee

Web archives preserve an unprecedented abundance of primary sources for the diachronic tracking, examination and -- ultimately -- understanding of major events and transformations in our society. A topic of interest is, for example, the rise of Euroscepticism as a consequence of the recent economic crisis.
We present an approach for building event-centric sub-collections from large archives, which includes not only the core documents related to the event itself but, even more importantly, documents which describe related aspects (e.g., premises and consequences). This is achieved by 1) identifying relevant concepts and entities from a knowledge base, and 2) detecting mentions of these entities in documents, which is interpreted as an indicator for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record, and we test its performance on the TREC KBA Stream corpus, a large and publicly available web archive.

Jan R. Benetka, Krisztian Balog and Kjetil Nørvåg. Towards Building a Knowledge Base of Monetary Transactions from a News Collection (Full)
We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25\% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.

Alexander Nwala, Michael Nelson, Michele Weigle, Adam Ziegler and Anastasia Aizman. Local Memory Project: providing tools to build collections of stories for local events from local sources (Full)
The national (non-local) news media has different priorities than the local news media. If one seeks to build a collection of stories about local events, the national news media may be insufficient, with the exception of local news which "bubbles" up to the national news media. If we rely exclusively on national media, or build collections exclusively on their reports, we could be late to the important milestones which precipitate major local events, thus, run the risk of losing important stories due to link rot and content drift. Consequently, it is important to consult local sources affected by local events. Our goal is to provide a suite of tools (beginning with two) under the umbrella of the Local Memory Project (LMP) to help users and small communities discover, collect, build, archive, and share collections of stories for important local events by leveraging local news sources. The first service (Geo) returns a list of local news sources (Newspaper, TV and Radio stations) in order of proximity to a user-supplied zip code. The second service (Local Stories Collection Generator) discovers, collects and archives a collection of news stories about a story or event represented by a user-supplied query and zip code pair. We evaluated 20 pairs of collections - Local (generated by our system) and non-Local by measuring archival coverage, tweet index rate, temporal range, precision, and sub-collections overlap. Our experimental results showed Local and non-Local collections with archive rates of 0.63 and 0.83, respectively, and a tweet index rates of 0.59 and 0.80, respectively. Local collections produced older stories than non-Local collections, but had a lower precision (relevance) of 0.77 compared to a non-Local precision of 0.91. These results indicate that Local collections are less exposed, thus less popular than their non-Local counterpart.

Moderators
avatar for Justin F. Brunelle

Justin F. Brunelle

Lead Researcher, The MITRE Corporation
Lead Researcher at The MITRE Corporation and Adjunct Assistant Professor at Old Dominion University. Research interests include: | web science, digital preservation, cloud computing, emerging technologies

Speakers
avatar for Federico Nanni

Federico Nanni

Researcher, University of Mannheim
MN

Michael Nelson

Professor, Old Dominion University
avatar for Alexander C. Nwala

Alexander C. Nwala

PhD Student/Research Assistant, Old Dominion University
avatar for Michele Weigle

Michele Weigle

Associate Professor, Old Dominion University
avatar for Adam Ziegler

Adam Ziegler

Managing Director, Harvard Library Innovation Lab, Harvard University
Adam Ziegler is an attorney and member of the Library Innovation Lab at Harvard Law School where he leads technology projects like Free the Law, Perma.cc and H2O. Before taking that role, he co-founded a legal tech startup named Mootus and represented clients for over a decade at... Read More →


Wednesday June 21, 2017 14:00 - 15:30
Innis Town Hall 2 Sussex Ave, Toronto, ON M5S 1J5

Attendees (20)