NEWSGAC: News Genres Transparant Automatic Genre Classification
How genres in newspapers and television news can be detected automatically using machine learning in a transparent manner, to capture the shift from opinion-based to fact-centred reporting.
This project studies how genres in newspapers and television news can be detected automatically using machine learning in a transparent manner. This enables us to capture the often hypothesized but, due to the highly timeconsuming nature of manual content analysis, largely understudied shift from opinion-based to fact-centred reporting. This enables us to capture the often hypothesized but, due to the highly timeconsuming nature of manual content analysis, largely understudied shift from opinion-based to fact-centred reporting.
Moreover, we open the black box of machine learning by comparing, predicting and visualizing the effects of applying various algorithms on heterogeneous data with varying quality and genre features that shift over time. This enables scholars to do large-scale analyses of (historic) texts and other media types as well as critically evaluate the methodological effects of various machine learning approaches.
This project brings together expertise of journalism history scholars (University of Groningen), specialists in data modelling, integration and analysis (CWI), digital collection experts (National Library & Netherlands Institute for Sound and Vision) and e-science engineers (eScience Centre). It uses a big manually annotated dataset (VIDI-project PI) to develop a transparent and reproducible approach to train an automatic classifier. Building upon this, the project generates three outcomes:
1. A study that revises our current understanding of the interrelated development of genre conventions in print and television journalism based upon large-scale automated content analysis via machine learning;
2. Metrics and guidelines for evaluating the bias and error of the different pre-processing and machine learning approaches and of-the-shelf software packages;
3. A dashboard that integrates, compares and visualises different algorithms and underlying machine learning approaches which can be integrated in the CLARIAH Media Suite.
Project info
Onderzoekers
Hoogleraar Media en Journalistieke Cultuur, Rijksuniversiteit Groningen
Meer projecten
DIGIFIL: Digital Film Listings
DIGIFIL aims to digitise the Dutch Filmladders and contextual information about the wider movie landscape as reported in historical newspape...
HUMIGEC: Human capital, immigration and the early modern Dutch economy
What was the contribution of migrant workers to the 18th-century Dutch economy? We reconstructed the careers of native and migrant sailors w...
CoDoSiS: Combining Data on Slavery in Surinam
The aim of the research pilot ‘Combining Data on Slavery in Surinam’ (CoDoSiS) was to develop a strategy to convert existing datasets on sla...