Improve handling of generated content by poppler
The poppler normaliser handles and generates a lot of content. When running long time with a lot of documents it ends up with a lot of useless content that need to be removed manually and also a lot of duplicated content that could have been shared.