Plagiarism scanner integrated into an editorial approval workflow
By: Thiago Campos Viana | March 6, 2017 | Business solutions, Web solutions, plagiarism scan, workflow, and plagscan
In our case study FindaTopDoc Prescribes eZ Publish for Healthy Content Management, we briefly covered our integration of PlagScan into the editorial approval workflow. When writing about medical topics, content -- especially medical term definitions -- can end up being duplicated on other sites, even if it was not purposely copied. Therefore, it is important for SEO reasons to ensure that all content on the FindaTopDoc site is as unique as possible. Here we'll take a closer look at how the plagiarism scanner integration works.
When dealing with a large group of writers, it can be time consuming to check the submitted articles for plagiarized text. On the web this can be practically impossible to do manually. Here's where a service like PlagScan comes in handy.
PlagScan is a web service that verifies the authenticity of documents. It accesses billions of documents to compare content, and quickly gives a score ranging from 0% (no plagiarism detected) to 100% (full copy and paste of text).
For FindaTopDoc, in order for an article to be published, we configured a rule that it cannot exceed a score of 20% duplicate copy.
Using the PlagScan API, Mugo integrated PlagScan seamlessly into the eZ Publish back-end. When a writer creates an article in the back-end, they can run the plagiarism scanner up to three times themselves before sending the article to an editor. The system sends the article text to PlagScan, and then the writer is able to check the score and a plagiarism report. Then he/she can decide to make more edits to the text or submit the article to an editor.
The editor can then do a final review, before setting the status to "Ready for Scan".
When the document is set to "Ready for Scan", the system will perform a final scan before publishing it. It will submit it to PlagScan one more time and then parse the response, checking the final score. If the score is equal to or below 20%, the article will be published. Otherwise, the article will be set to "Failed", meaning the writer and/or editor will need to update the article appropriately and re-submit it through the workflow.
The final result is a powerful, streamlined editorial workflow -- with special SEO considerations -- built on top of eZ Publish.