About the Corpus
The Swiss Twitter Corpus is a continuously collected compilation of Tweets related to Switzerland since January 2018. Whether they originate in Switzerland, talk about a Swiss topic or are written by an important Swiss writer, all collected tweets are relevant to Switzerland and capture what’s going on in and around Switzerland.
It aims to be a comprehensive data trove that enables us to answer almost any question you might have regarding Switzerland and its citizens.
Each Tweet is annotated, if applicable, with the Swiss topics it is related to, its geographic location and the language its written in (English, German, French, Italian and Swiss German). Tweets are also annotated with sentiment information, telling you how positive or negative they are.
To further improve the usefulness of the corpus, each Tweet is annotated with a “Swissness” score, which measures just how “Swiss” the Tweet is, giving you full control over how specific you want to be.
Example Use Cases
- See if Swiss people like a certain topic and how it differs by cantons
- See how Swiss people view Swiss topics compared to people from other countries
- Find out how people with different interests are connected to each other
- Find common interests for Swiss user segments
- Determine the most important word/topic in Switzerland this year
- Compare language usage across cantons
The distribution of Tweets by geo-coordinates neatly follows the population distribution in Switzerland, with Zurich having the most tweets and highest population, followed by Geneva, Basel, Bern and Lausanne. One exception to this is people and companies targeting all of Switzerland geographically, which results in the towns of Engelberg and Sachseln (The geographic "Middle of Switzerland") being present a disproportionate amount.
The name of Switzerland and it's variations, along with popular cities, is very prominent on Twitter. Roger Federer is by far the most prominent Swiss personality, having roughly 5 times as many Tweets as Granit Xhana in second place. Famous Swiss companies are also present, with Credit Suisse leading the field, followed closely by Rolex and Swatch, indicating the national focus on Swiss banking and watches.
Mentions of relevant keywords for the discussion steadily rise as the vote approaches, with #nobillag being the most prominent term, followed by #billag and #neinzunobillag. Surprisingly, discussion falls significantly during the 8 days preceding the vote, indicating that most discussion is done up to one week before the vote. They day of the vote itself is marked by a sharp increase in Tweets, followed by a quick decline back to baseline, with people losing interest only 2-5 days after the vote.
Tweets from all Cantons, annotated with sentiment (positive, neutral, negative) and averaged per Canton. This shows if people Tweet mostly in a positive or negative way, with positive sentences like "I love cats!" and negative sentences like "I hate my work".
The Cantons of Jura and Lucerne have the most positive attitude overall, whereas the Cantons of Aargau and Schaffhausen are tied for most negative attitude.
As of 15.04.2018:
20’000 Tweets per Day
2 Million Tweets so far
500’000 Geolocated Tweets
Over 600 Relevant Swiss Topics
Using the Corpus
The corpus can be utilized in the following ways:
- We analyze the data for you: Just tell us what it is you want to know and we take care of the rest
- Get a subset of the corpus: Provide us with a search query and we provide you with all the relevant Tweets
- Access the full corpus: Get all the Tweets and use them in your projects
For more information, please contact us
SpinningBytes AG is a leading company in Text Analytics, Natural Language Understanding and Processing and Artificial Intelligence. We are a SpinOff of both the ZHAW und ETH universities and we aim to bring state-of-the-art research to industrial applications and to help you get the most value out of your data.