The Eclipse mailing lists datasets

About this datasets

The Eclipse Foundation provides individuals and organizations with a commercially focused environment for open source software innovation. It includes git repositories, reviews, issues management, continuous integration, forums and mailing lists among other services. Many well-known and widely used projects are hosted on the forge, including the Eclipse IDE itself, several projects about IoT, modeling, and the new Java working group.

This dataset is published under the Creative Commons BY-Attribution-Share Alike 4.0 (International) licence.

The CSV extract

This dataset is a dump of all posts sent on all mailing lists hosted at the Eclipse Forge. It only includes the list name, post ID, sent date, author name and address, and post subject. the body of messages is dismissed.

Although this is public data (the mailing lists can be browsed on the official mailman page) all data has been anonymised to prevent any misuse.
The privacy issues identified, along with the anonymisation process, have been covered in a dedicated document.


Project mboxes

This dataset provides all Eclipse mailing lists as mboxes, compressed using gzip. Exhaustive list of downloads is as follows:

Note: list obtained through the following command:

for i in `ls`; do
     s=`du -sh $i | cut -f1`;
     echo "* [${i%%.mbox.gz}]($i) (size: $s)" >> list.txt;