Since I became a college student, emails were quite an important part of my study and work life.
Later on, as I started piling up addresses in different countries, they turned into a fundamental point of contact with my life's social sphere. Gradually my emotional/personal view of emails as a means of communication shifted from "tool for planning and coordination" to a kind of forum where to hang out with my friends. I ended up thinking that, comparatively, "work" was only the small insignificant tail of the companion dog that was my emailing life.
If you had asked before I finished this project (and even if you hadn't asked), I would have asserted the following beliefs:
- I write more personal emails than work emails;
- my personal emails are averagely much longer than the work-related ones;
- when writing to friends, I usually spend a bigger effort on the writing style(some of these texts are drafted three times or more). As a consequence, my personal emails show a "denser" vocabulary compared to work emails (i.e. a larger set of unique words in them).
These beliefs have stuck with me for a while and lasted until before this project was completed. Although they had little consequences besides my perception of my writing habits I wanted to know how things actually were, maybe for emotional reasons.
Looking at the facts - my emails of the last three years in English - it turned out things were not exactly as I thought.
Looking at the facts - my emails of the last three years in English - it turned out things were not exactly as I thought.
I do not write mostly personal emails (anymore)
When I started believing my email life was mostly a social life (Fall '12) I was actually writing more emails to friends than for work: the slope of those two plots is very different in that period. Personal emails rose significantly one more time in Summer '13. However, now there is no big difference among the two. In absolute numbers, for what it's worth, personal emails may never catch up...
Still, if we look at total number of words sent, personal emails did catch up:
Average length of personal emails used to be higher
The last plot, on cumulative length, does not tell much about how long an average Matteo's email is in 2014. And how long is an average personal email now compared to a work-related one? Not much apparently.
Looking at the left side of the plot however, things start making sense: the few exclusively long personal emails of Fall 2012 gradually started being replaced by a larger variety of sizes.
Looking at the left side of the plot however, things start making sense: the few exclusively long personal emails of Fall 2012 gradually started being replaced by a larger variety of sizes.
The vocabulary of personal emails is less dense
One more belief turning out to be false.
Work-related emails have a harder time to go beyond the 800 words but overall they contain more unique words over total length. Not only that, but if two emails of the same type differ in length, then their vocabularies would differ more if they were work-related rather than if they were personal.
If you are curious about that super long email in the top right corner (you may have to click the Autoscale button in the plot to see it) , you are not alone. I could not fathom what that could possibly be before digging for that exact blue spot with the help of Python. I found out it is not only an outlier in length but also in "type": it is an email I sent to a friend where I had copied and pasted a philosophical article I wanted to share. A "personal" email, but not written by me.
Taking a Math break: Emails = People + Words
Stephen Wolfram's "The Personal Analytics of My Life" was the work that inspired me to write code to analyze my emails. Wolfram presents the visual story of his email habits since 1989 with a simple scatter plot. I was amazed: to me it was as if someone had poured orange paint all over the screen and a whole slice of this guy's life had come out. His email habits are quite more intense than mine, since we are talking about around 13K emails sent every year.
The plots I have shown so far are very close in spirit to his analysis: emails and time. I also told you a little bit about words in them so far (length and vocabulary). But emails are more than that. As a start words in emails have a meaning for us.
Also, emails are sent to other humans (most of the times at least). The online tool Immersion by MIT Media Lab examines the perspective "Whom you write to is who you are". Given access to your email account it will show a network of the connections between you and the people who wrote to you and you wrote to. Immersion is like a scalpel one can use to vivisect time slices of crawling emailing life; complex apparatuses of networks of people are hiding under the soft skin of our inbox. Each of these tissues representing a period in time: the people you wrote weekly to in Fall '11 may not be the same during the next year; some will fade, others will be gone, more will have joined.
Immersion, however, did not make me feel immersed deeply enough for my taste. This online tool only takes into account the "number of exchanged emails" but no other properties of the interaction. A clerk at my school would show as relevant as one of my closest buddies. I felt there was much more a software like Immersion could do if it used the content of the emails at all.
I asked myself if I could try to make some kind of visualization of some of my closest pen pals based on how and what I wrote to them.
In the remainder of this article I will tell you about my attempt to use the how.
Also, emails are sent to other humans (most of the times at least). The online tool Immersion by MIT Media Lab examines the perspective "Whom you write to is who you are". Given access to your email account it will show a network of the connections between you and the people who wrote to you and you wrote to. Immersion is like a scalpel one can use to vivisect time slices of crawling emailing life; complex apparatuses of networks of people are hiding under the soft skin of our inbox. Each of these tissues representing a period in time: the people you wrote weekly to in Fall '11 may not be the same during the next year; some will fade, others will be gone, more will have joined.
Immersion, however, did not make me feel immersed deeply enough for my taste. This online tool only takes into account the "number of exchanged emails" but no other properties of the interaction. A clerk at my school would show as relevant as one of my closest buddies. I felt there was much more a software like Immersion could do if it used the content of the emails at all.
I asked myself if I could try to make some kind of visualization of some of my closest pen pals based on how and what I wrote to them.
In the remainder of this article I will tell you about my attempt to use the how.
How many (unique) words does it take to ask "How is it going?" ?
The plot above is an attempt to examine how writing style changes according to the person I am writing for. Each bubble is a friend who received at least 50 emails by me in the last three years. The wordier the email (on average) the fatter the bubble. The position on the chart depends on how many unique words I use on average in a single email (x axis) and on how many unique words I used throughout all my exchanges with that person (y axis).
These two notions of vocabulary size are different: less text or fewer topics within a single email may locate a person on the left of the chart; the higher the bubble, the larger the variety of topics and writing styles through time.
These two notions of vocabulary size are different: less text or fewer topics within a single email may locate a person on the left of the chart; the higher the bubble, the larger the variety of topics and writing styles through time.
My colleagues (bottom-left bubbles), more than others, seem to receive emails that are short and to the point (in one word: boring). Also, they are two of the only three people above who live in the same continent (as well as city) I'm currently in. As a consequence those bubbles are probably thinned by beers and dive bars often replacing screens and keystrokes for interaction.
Consistently with the previous plots on emails, longer emails have a larger vocabulary.
Another maybe unsurprising fact: the uppermost bubble - receiving almost 7500 unique words by emails from me - is my significant other. Annika accounts for around 60% of the total volume of the personal emails I send. This high volume - and its variance - may also explain the comparatively small size of the bubble: all those emails contain long, carefully planned pieces as well as hurried one liners.
A few next questions for future work: are the vocabulary of all those bubbles separate? Can we make an automatic taxonomy of people based on words or, even better, categories we discuss?
Consistently with the previous plots on emails, longer emails have a larger vocabulary.
Another maybe unsurprising fact: the uppermost bubble - receiving almost 7500 unique words by emails from me - is my significant other. Annika accounts for around 60% of the total volume of the personal emails I send. This high volume - and its variance - may also explain the comparatively small size of the bubble: all those emails contain long, carefully planned pieces as well as hurried one liners.
A few next questions for future work: are the vocabulary of all those bubbles separate? Can we make an automatic taxonomy of people based on words or, even better, categories we discuss?
Rabbits, Agents, Blankets: Wor(l)ds at a glance
![]() |
| A word cloud of the whole corpus of my personal emails |
![]() |
| A word cloud of the whole corpus of my work emails |
The first cloud is highly personal, filled with terms reporting emotional states and inside jokes; also the personal list seems apparently biased by listings of flight itineraries.
The second cloud is dominated by the keywords of my M.Sc. thesis and from my correspondence with my advisor at the time (as you guessed by now, the topic was "Belief Revision in Argumentative Agents"); many other words also come from the Programming/Computer Science world. Several others were used as vehicles of vented frustration towards school bureaus (if you ask me, "stipend" in the second cloud is spelled exactly as "anxiety" in the first one).
To me one of the most striking aspects of those clouds is the lack of context in which those words were used. Looking at many of them in those frames, I am as curious and clueless as you plausibly are, reader. Although I would not really know why "West" has made it into that list and you are probably not interested in the incidental origin of animal-based inside jokes, I thought it would be relevant to see a few interesting examples of how meaning is harder to grasp without a context:
- anal: I know what you are thinking - and what you are thinking is probably the same as what my mother said loud when she saw this - but sorry to disappoint: the term "anal" is a frequent 'citation' of Freudian psychoanalysis I use to explain people about my occasional excessive need for planning and workaholism.
- isis: this is interesting at least for two reasons, both due to the lack of context. First, again, the way I use the word is probably not the same as the one a random reader may have in mind. Second, I wonder about the reaction of that random reader if that reader happens to be some governmental agency. So, if you are thinking of the self-proclaimed Islamic State, please know that isis in my emails is the Institute for the Study of Intelligent Systems at Utrecht University.
- survivor: the question here is "why does this word made it into the 'work' cloud?". This time it has to do with computer systems jargon: survivors turn out to be a concept in generational garbage collection and a few years ago I had a significant email exchange with a person in the field to work on survivors.
Conclusions
I proposed how emails can be dissected to tell something about whom we are by saying how we write and in particular how we write to certain people. That's not all there is though: there is plenty of questions to be examined on how the what changes according to the people we write. Such visualizations should make use of semantics and sentiment analysis.


No comments:
Post a Comment