Málaga Workshop
The role of text mining in curation workflows
Goals of the meeting
At the Malaga meeting, we will have users of information from the Gene Regulation Knowledge Commons, curators of several domains of gene regulation information, and text miners in one workshop. We hope that this will stimulate discussions and enforce a joint understanding of the challenges, opportunities, and ways forward.
- María del Mar Roldán García
Workshop Organisers:
- Martin Krallinger
- Fabio Rinaldi
- Marcio Luis Acencio
- Martin Kuiper
21 of February:
Morning session:
- 09:00 – 09:10: Workshop – scope and aims (Martin Kuiper)
- 09:10 – 09:50: Session 1 – Gentle introduction to text mining and gene regulation (Chair: Astrid Lægrid)
- 09:10 – 09:30: Text-mining basics and how it can assist curation (Fabio Rinaldi)
- 09:30 – 09:50: Fundamentals of gene regulation and the GREEKC Working Group 2 (Sandra Orchard)
- 09:50 – 12:20: Session 2 – Curation needs in the 5 different areas of Working Group 2 (Chair: To be announced)
- 09:50 – 10:05: Protein level: transcription factors and co-factors (Ruth Lovering)
- 10:05 – 10:20: Non-coding RNA level: miRNAs and beyond (Simona Panni)
- 10:20 – 10:35: Genome level: Transcription Factor Binding Sites and regulatory elements (Colin Logie)
- 10:35 – 10:50: Interaction level: Dealing with Causality (Vasundra Touré)
- 10:50 – 11:20: Coffee break
- 11:20 – 11:50: Discussion: How can text mining meet these diverse and precise curation needs?
- 11:50 – 12:20: Initial considerations about the hackathon/jamboree session (Fabio Rinaldi / Martin Krallinger)
- 12:20 – 13:00: Keynote lecture – Title to be announced (Alfonso Valencia)
- 13:00 – 14:00: Lunch
Afternoon session:
- 14:00 – 16:00: Session 3 – Text mining solutions – what works and what does not (Chair: To be announced)
- 14:00 – 14:20: Text mining in the Wikipathways initiative (Susan Coort)
- 14:20 – 14:40: ExTRI: extraction of DbTF-TG interactions from abstracts (Fabio Curi Paixao)
- 14:40 – 15:00: The Dark Space Project (Pablo Porras)
- 15:00 – 15:20: LION LBD: a literature-based discovery system for cancer biology (Sampo Pyysalo)
- 15:20 – 15:40: How can the Visual Syntax Method (//scicura.org/info.html) meet text mining?
- 15:40 – 16:00: Discussion: listing of questions and discussion topics for further discussion
- 16:00 – 16:30: Coffee break
- 16:30 – 17:30: Session 4 – Applicability of text mining for GREEKC objectives (Chairs: Fabio Rinaldi, Martin Krallinger and Martin Kuiper)
- 16:30 – 17:30: Open discussion on topics raised in Session 3:
- What are the areas with low hanging fruit, what are the bottlenecks?
- What uses are best addressed by what text mining pipeline?
- Other
- 16:30 – 17:30: Introduction to the hackathon/jamboree tasks:
- 16:30 – 17:00: Text mining pipeline hackathon (Fabio Rinaldi)
- 17:00 – 17:30: Curation jamboree (Martin Krallinger)
- 16:30 – 17:30: Open discussion on topics raised in Session 3:
Social Activities:
- 19:00: Guided tour around the city center (meeting point at Plaza de la Merced – Google Maps)
- 20:30: Dinner at “La Reserva del Olivo” – Google Maps
22 of February:
Morning session:
- 09:00 – 09:40: Keynote lecture – Using text mining in biomedical databases (Lars Juhl Jensen)
- Plenary session:
- 09:45 – 13:00: Session 5 – Quality metrics and sharing of text mining (Chair: Pablo Porras)
- 09:45 – 10:00:Extracting microRNA-gene relations from biomedical literature using distant supervision (André Lamúrias)
- 10:00 – 10:15: Quality metrics for text mined data: Dark Space (Pablo Porras)
- 10:30 – 11:00: Discussion: how can we improve QM? Are there other handles we can use for confidence and trust?
- 11:00 – 11:30: Coffee break
- 11:30 – 11:50: Sharing the ExTRI resource via Biogateway (Martin Kuiper)
- 11:50 – 12:10: Configurable web-services for biomedical document annotation (Sergio Matos)
- 12:10 – 12:30: EuropePMC SciLite annotations (Xiao Yang)
- 12:30 – 13:00: Discussion: the future for sharing of gene regulation-related text mined data/provenance checking
- Text mining hackathon breakout:
- 09:45 – 09:50: Getting ready (Fabio Rinaldi)
- 09:50 – 11:00: Hands-on session
- 11:00 – 11:30: Coffee break
- 11:30 – 13:00: Hands-on session (continued)
- Curation jamboree breakout:
- 09:45 – 09:50: Getting ready (Martin Krallinger)
- 09:50 – 11:00: Hands-on session
- 11:00 – 11:30: Coffee break
- 11:30 – 13:00: Hands-on session (continued)
- 13:00 – 14:00: Lunch
Afternoon session:
- 14:00 – 16:00: Session 6 – Text mining integration into curation workflows (Chair: Fabio Rinaldi)
- 14:00 – 14:20: Assisted curation pipeline of RegulonDB (Carlos Méndez / Yalbi Balderas)
- 14:20 – 14:40: neXtA5: supporting biocuration activities at neXtProt (Pascale Gaudet)
- 14:40 – 15:00: To be announced
- 15:00 – 15:30: Discussion: how to build an effective text mining pipeline integrated to curation workflows in gene regulation?
- 15:30 – 16:00: Coffee break
- 16:00 – 17:30 – Session 7 – Closing remarks (Chairs: Fabio Rinaldi, Martin Krallinger and Martin Kuiper)
- 16:00 – 16:30: Hackathon and jamboree results (Martin Krallinger / Fabio Rinaldi)
- 16:30 – 17:00: New collaborations – the way forward
- 17:00 – 17:30: Next steps and action points
Summary Results
During a series of presentations and working sessions, two central, but disparate communities governing gene regulation knowledge management immersed into each other aims, challenges, and working modes: on the one hand, the computationally-oriented community providing text mining assisted information extraction, curation tools, and resources (such as STRING), and on the other hand biocurators of the central knowledge bases represented in the GREEKC (UniProt, IntAct, SIGNOR, Complex Portal). The curation jamborees that were conducted revealed the necessity of well designed and well justified and explained curation guidelines tailored to the specific curation task at hand and illustrated that the rationale and working mode of curators of high precision, high quality, fine-grained, highly provenanced knowledge bases is – and has to be – different from the working mode of large scale information extraction. The workshop illustrated the value of text mining, conceptual curation frameworks such as MI2CAST, and executable curation tools such as VSM-based SciCura and revealed that there is some way to go to achieve the full potential of productive co-labor of the two communities for gene regulation knowledge management.
Zenodo community containing the datasets for both the Jamboree and the Hackathon