Friday, January 30, 2015

Watch this video about how to deal with non-compositionality in parsing of languages

How to deal with languages that do not fit our limited compositional view of parsing? I was just reminded of the wonderful existence of Reut Tsarfaty and the beautiful little video that she made together with Noa Tsarfaty  for the PhD. 

Just watch it, it's very good.

Full reference: Tsarfaty, Reut. 2010 "Relational-Realizational Parsing". PhD Dissertation, University of Amsterdam. ILLC Dissertation Series DS-2010-01 ISBN:978-90-5776-205-5

Links: txt-file of abstract hereentire fulltext here

Extract from abstract: Statistical parsing models aim to assign accurate syntactic analyses to natural language sentences based on the patterns and frequencies observed in human-annotated training data. State-of-the-art statistical parsers to date demonstrate excellent performance in parsing English, but when the same models are applied to languages different than English, they hardly ever obtain comparable results. The grammar of English is quite unusual in that it is fairly configurational. This means that the order of words inside sentences in English is relatively rigid and that the morphology of words is rather impoverished. The main challenge associated with parsing languages that are less configurational than English, such as German, Arabic, Hebrew or Warlpiri, is the need to model and to statistically learn complex correspondence patterns between functions, i.e., sets of abstract grammatical relations, and their morphological and syntactic forms of realization. This thesis proposes a new model, called the Relational-Realizational (RR) model, that can effectively cope with parsing languages that allow for flexible word-order patterns and rich morphological marking.

Thursday, January 22, 2015

How to "fake" a language to a linguist fieldworker

My friend Calle drew my attention to the fact that the well-renowed linguist Lyle Campbell has written an article on how to spot that a speaker is actually faking knowledge about a language. Interesting, huh? And, you can read this article for free online! 

Full reference: Campbell, Lyle (2014) How to "fake" a language. Estudios de Lingüística Chibcha (ISSN 1409-245X) 33: 63-74

Link to free online PDF of official version

Link to in press-academia version (in case the above one has problems)

In the course of several years of fieldwork in Central America and Mexico seeking potential speakers of endangered languages, I encountered on several occasions individuals who attempted to fabricate a language, to create spontaneously what they hoped I would take to be an indigenous language. The number of cases in the sample considered here is not large; nevertheless. the goal of this paper is to attempt to make some general observations about how these individuals have attempted to fake a language. It may be valuable to be able to spot a fake and to distinguish it from real languages, particularly since new, previously unidentified languages have continued to tune up in this region. For example, Terrence Kaufman discovered Sakapulteko (Sacapultec), Sipakapense (Sipacapa) (Kaufman 1976), and Teko (Teco) (Kaufman 1969), three new Mayan languages; I discovered Jumaytepeque, a previously unknown Xinkan language (cf. Campbell 1979); and Roberto Zavala Maldonado (2014) recently discovered a previously unknown Mixe-Zoquean language of the Zoquean branch in Chiapas, Mexico. Clearly, then, one cannot conclude that anything different or unexpected must be a fake. The fakes I have encountered appear to exhibit defining characteristics that distinguish them readily from real languages, and these earmarks alone are sufficient for distinguishing the deceptions from real languages 
Why would a person try to fabricate a language, why one would engage in such deception? Whatever the full range of motives might be, two significant ones almost certainly involve money and status. 
As mentioned above, fakers run out of steam after a short while, not able to continue to come up with new “words.” 
The faked languages I have encountered exhibit none of the recurrent parts of “words” that might be associated with inflectional or derivational morphology in true languages. Fakers seem incapable of fabricating morphology. 
In short, it turns out to be very easy to detect attempts to fake a language in these situations – they exhibit the characteristics pointed out here.

Sunday, January 11, 2015

Akan and #lingwiki: a typical example of misinformation

I'm editing some wikipedia articles on Akan and neighbouring languages and dialects, as a part of #lingwiki. Wikipedia editing is fun, easy and you should do it too. Don't be scared, there's lots of help  - you are not the first beginner of wikipedia editing.

If you just search for "Akan" on wikipedia you get to a disambiguation page where they list several articles matching that search. There you find a link to the pages on Kwa languages and on Central Tano languages. That's relevant, as the Akan language is a part of those higher level groupings. However, it is not a good idea to have them both be described as "a stock of dialects spoken by the Akan people".

Now, of course. In some sense all languages are collections of dialects - heck they're all collections of idiolects. But... Kwa is a really, really large group. Central Tano (which is part of Kwa) even includes another subgrouping, Akanic, which in its turn includes a macro language (according to Ethnologue and the ISO standards), Akan, which in its turn is divided up by the ISO 639-3 into two languages, Twi and Fante.

Even if theoretically languages are all made up of smaller units, this is not how we typically described such high level groupings. It's like calling Romance languages a collection of dialects, it hardly seems like the appropriate level of description. I googled "wikipedia" and "a stock of dialects". It basically only gives hits for languages related to Akan and the information has spread far.

I changed it, it now read "language group which includes Akan". It might not be the best description, but since it's a disambiguation page for articles related to the search term "Akan" I saw it as ok.

There is a tendency for languages of areas less known to westerners to be lumped together as "dialects" even when they show a similar degree of mutual intelligibility and shared lexicon as varieties that we wouldn't hesitate calling different languages. Differences in language varieties we know better is sometimes valued as more critical than differences between varieties that we are less familiar with.

We probably cannot get at the perfect identification and classification of all languages, but we can at least try and be consistent. It makes for better research and better treatment of all people.

I hadn't noticed this if it wasn't for #lingwiki. In the future I'm going to spend more time in the section for linguistics articles that need expanding and correcting.

p.s. I'm not even going to get into the discussing of wether "Kwa" is a good grouping or not. A lot of fair criticism has been launched at that analysis, that the languages most often included in Kwa are not related enough or that they are similar due to contact not genealogy.

Wednesday, January 7, 2015

Wanna keep updated on new exciting research? - filter mailinglists

If you wanna keep updated on new exciting research there's several things you can do. And it's not that hard, and if you balance it and filter the feed correctly you won't get swamped.

You can subscribe to certain blogs, tumblrs and twitter feeds of particular linguists or institutions. I'll create a list of suggestions soon. You can also subscribe to people on academia or find RSS feeds of departments or other research institutions.

However, one of the easiest ways to keep track are mailing lists. Linguist List is a non-profit organisation that among other things gathers a lot of interesting mailinglists, you can browse them all here. You can set it to giving you digests instead of everything all the time. The most interesting list is probably "lingtyp" (according to me that is), but you gotta be a member of Association of Linguistic Typology to get in there.

Linguist List also keeps two general lists, LINGUIST and LINGLITE. These are freely available and if you're keen on knowing what's going on in the field you should definitely subscribe.

Now, for a word of advice. Some of these mailing lists are not very active at all, some are very active. The LINGUIST list is one of the main outlets for information about conference calls, new books being released, job positions, reviews of publications etc in the field of linguistics. LINGUIST has labels ("jobs", "calls", "books" etc) that will help you categorise posts quickly. Now, if you get all of that into your inbox you are going to get swamped, and fast! That's why you need to get some filters into your inbox. I don't know what email client you have, but most have some sort of way of prioritising emails using your contacts and filter terms. So, figure out how that work and start setting some filters. Create new filters as soon as you find something that seems relevant, weed out those that are over-generating uninteresting messages.

I've got filters on:

  • "pidgin" 
  • "creole" 
  • "nijmegen"
  • "reduplication"
  • "ulrike mosel"
  • "sign language typology"
  • "haspelmath"
  • "david gil"
  • "negation"
  • "harald hammarström"
  • "leipzig"
  • "samoan"
  • "samoa" 
  • "mpi nijmegen"
  • "max planck"
  • "glottolog"
  • "tirailleur"
  • "centre for language studies radboud"
  • "nordhoff"
  • "african languages"
  • "dissertation "grammar of""
  • "univeristy of manitoba"
  • "stockholm univeristy"
  • "östen dahl"
  • "parkvall"
  • "quantitative linguistics" 

to name a couple. Needless to say you need to keep them narrow enough to not get swamped, again :).

Reading those filter terms probably gives you a good idea of my research interests actually, now that I see them all listed that way. What are you filter words? If you don't have any, what do you think they'd be if you had them? I'm very curious ^^! Tell us!

The importance of studying diversity in the field of linguistics

Here's another one of those recommendations to go read something that is interesting, important, freely available online and quite accessible to non-scholars. Now, that we need to study diversity might be a rather trivial observation to many readers of this blog which is a lot about linguistic diversity, but it apparently still needs to be said and elaborated upon.

This time it's Greville Corbett who has kindly uploaded his paper "Why Linguists Need Languages" to the social sharing platform for academics: academia. This is sometimes known as blue open access, social sharing platform open access.

Free online PDF here

Full reference: Corbett, Greville G. (2001) Why Linguists Needs Languages. In Maffi, Luisa (Ed) On Biocultural Diversity. Linking Language, Knowledge and the Environment. Washington & London: Smithsonian Institution Press, 517-530.

Quote from the paper:
To an outsider, it must seem self-evident that linguists would have a key role in any investigation of the interdependence of linguistic, cultural, and biological diversity. And yet there are many professional linguists who are not concerned about diversity and its imminent reduction. This is surprising, given the situation evoked by this chilling quote:

Obviously we must do some serious rethinking of our priorities, lest linguists go down in history as the only science that presided obliviously over the disappearance of 90% of the very field to which it is dedicated. (Krauss 1992a:10)

This chapter makes the basic but essential point that linguistic diversity is central for linguistics.
 In order to do linguistics properly, we need every lat language that there is. And this is message for linguists as much as for others. While linguists have a strong professional interest in the issue, this is a much wider concern at stake.

If we want to understand ourselves, languages gives away a good deal: In the study of language, we embrace the very definition of what it means to be humans and to be a developing member of the species. (John L. Locke 1993:4)

Abstract (from when he presented it a paper with the same title at a conference)
Linguists are making increasingly detailed and sophisticated claims about the interrelations of linguistic constructions and of linguistic categories. Research of this type raises the question of the range of data required. Although at first sight the availability (in principle) of 6000 languages might appear wholly adequate, this is not straightforwardly the case. On the one hand, the same features may appear in different languages because they are genetically related; in fact, many languages form large families (Niger-Kordofanian has over 1000 members and Austronesian over 900), which drastically reduces the number of sources of data which are undeniably different. On the other hand, the areal spread of features means that even genetically unrelated languages may share features from a single source. These problems, made more acute by the rapid loss of languages (in Europe as in the other continents), are only just beginning to be appreciated by those who should be most aware of them, for quite selfish reasons, namely linguists. Examples will be given of particularly interesting features which have been found in languages that happen to be endangered, to give some idea of the seriousness of their loss for linguists.


You might also want to read this widely cited article below which among other things explains to non-linguists interested in language (aka psychologists) that
a) far from all linguists are in agreement over certain universals and theoretical models that are sometimes widely used as representation of language in the mind
 b) there's lots and lots of diversity.

Evans, N., & Levinson, S. C. (2009). The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences, 32(5), 429-492. doi:10.1017/S0140525X0999094X (free PDF available online)


Soon it's time for our collaborative wikipedia linguistics editing session!
Here are the most important things I want to tell you
The time for the LSA session is Saturday the 10th of January at 20-22 Pacific Standard Time (4-6 AM GMT/UTC). You can also edit before or after that time, that's cool too.

The hashtag for twitter is #lingwiki, you can of course also use it on the tumblrs, book of faces and elsewhere on the internets, but just know that the main event will be in the twitter-sphere. We're @grammar_swag on the twitters, you can tweet to us with the hashtag #lingwiki, that'd be sweet ^^!

This edit-a-thon is being organised by Gretchen of All Things LinguisticLauren of Superlingo will also be present as well as Moti and Adèle-Elise of LingSpace and yours truly Hedvig from Humans Who Read Grammars. And you? Tweet us if you're game, or tell us here if you ain't got twitter.

You're welcome to join even if you're not a linguist, there's lots of things that needs doing that don't require an in-depth knowledge of linguistics. Just, don't submit information that you are not sure about. Gretchen explains this better.

I really recommend everyone reading Gretchen's post here about how to participate. Don't worry if you're not a wikipedia-veteran, it'll be fine.

Wondering what to write about? Well, we're mainly targeting linguistics stubs (underdeveloped articles), under-documented languages and biographies of important linguists (in particular women and people of minority background). Go have a browse on the wikiprojectpage for linguistics.

I'd also like to remind you that wikipedia is multilingual. If you are competent enough in another language but English then consider working on translating linguistics articles that are relatively "finished" into other languages.

Finally, I could like to recommend having a look at this page from Humans Who Read Grammars where I present several resources for you where you can learn more about specific terms and topics in linguistics. These resources will most likely prove very useful as we're getting to work on wikipedia.

In particular I would like to highlight the resource of Glottopedia, a wikipedia for linguists run by among others our fellow tumblrer Jan of Linguisten. Now, in this project right now we're working with "regular" wikipedia because we want to improve the knowledge of linguistics in the general population. However, if you're interested do visit Glottopedia and become a member. There is information in Glottopedia that can easily be adapted to wikipedia, and also the other way around. Information flow between Glottopedia and Wikipedia is super double plus good.

Let's spread linguistics on the internets!

p.s. if you're not familiar with incorporating moving images (gifs) into text like this, just know that it's a part of tumblr culture (and generally blog culture) and read up more on it if you're curious.  It doesn't mean we're less serious linguists, it's just a certain way of communicating that we sometimes appreciate. These are of Kyary Pamyu Pamyu and from the music video for her debut song Pon Pon Pon.

Saturday, January 3, 2015

Twas the Night Before Christmas (Linguists' Version)

Sorry for being late, so very late, but I wanted to share this with you.  Keep it and save it for next year, ok ^^?

A certain Dave Sayers sent out a poem om a mailinglist for linguists just before christmas, it's a remake of the traditional poem "Twas the Night Before Christmas",

Twas the Night Before Christmas (Linguists’ version), by Dave Sayers, 2014.
(Shared with Creative Commons Attribution-ShareAlike licence: Also online alongside the original poem here:

Twas the night before Christmas in the ivory tower,
Not a creature was stirring at the midnight hour,
Twas a problem for linguists who live to hear sounds,
Monophthongs, diphthongs, open or round.

We linguists were nestled all snug in our beds,
While visions of fricatives danced in our heads.
Snug in our gowns and our four-cornered caps,
We pondered enigmas like bilabial taps.

When out on the lawn there arose such a clatter.
I sprang from the bed hoping for research matter.
Away to the window I flew like a flash,
Hoping my equipment would record and not crash.

The moon made a shape like a back-rounded vowel,
Which was also the sound that I heard from an owl.
When, what should my wondering eyes quickly see,
But a representative sample of society.

Old folks with conservative pronunciations,
And fad-happy teens with their fresh innovations,
Networked globetrotters, laggards and lames,
Their dialect features I noted by name!

Now Nasal! Now Velar! Now Plosive, and Dental!
On Glottal! On Spirant! On Flap and Segmental!
To the top of the mouth! With the tip of the tongue!
Now palatalise! Labialise! Old, middle and young!

This I must record, and I must analyse,
But before I do that I can only surmise,
That this being Christmas and I being me,
There may be more surprises ready to see.

And then, in a twinkling, I heard in the kitchen,
A noise that soon set my microphone twitching,
As I drew in my head, and was turning around,
Down the chimney St Nicholas came with a bound.

He spoke in a language never before heard,
Not a sound ever known to man, beast or bird.
Impossible sounds he flung from his face,
He spoke like a machine, or a being from space.

Why hadn’t I learned this from tales as a child?
That his taps, clicks and trills were so phonetically wild?
It seemed that his mouth was drawn at an angle
That enabled these baffling articulatory tangles.

The stump of a pipe he held tight in his teeth,
Through which fell approximants past the IPA’s reach.
Pharyngeal nasals, glottal flaps quite deducible,
He produced all the phonemes that were thought unproduceable.

I looked through the glass at his hovering sleigh,
And realised at last where his origins lay.
He’d come from the future, he can travel through time,
Hence climbing every chimney in one night, even mine.

This also explained his weird trills, taps and flaps.
In his time, we’ve evolved to fill all these gaps.
He said his farewells and climbed back whence he’d bound,
But wait, how come I now understood all his sounds?

In his time, Google Translate’s become far superior.
It transferred his message into my mind’s interior.
So I heard him exclaim in his space-age vernacular,
"Happy Christmas to all, may your peer reviews be spectacular!"

Making linguistics on wikipedia better!

Hurray, let's make the world better by improving linguistics on the internets! There will be a coordinated effort to improve articles on linguistic topics on Wikipedia. It's being formally hosted at the annual meeting of Linguistic Society of America (LSA), headed by Gretchen McCulloch of the excellent blog All Things Linguistic. She's written more about the editing session here.

It'll formally take place on Saturday the 10th of January at 20-22 Pacific Standard Time (04-06 GMT/UTC).

We're especially focusing on:

1. Linguistics stubs
2. Under-documented languages
3. Biographies of prominent linguists, especially female linguists and other minorities

I'll be participating even though I won't be there in the flesh world. Wanna join? Give us a notification here. There's no restriction language-wise, no need to edit only English articles.

Let's celebrating with some Chris Pratt grooving.