Tag Archives: classification

Lump it together and split the difference: taxonomy, folksonomy and the uses of genres…

On the back of a couple of Twitter conversations recently and borrowing my brother’s ipod (other MP3 players are available…), I have been thinking more about taxonomy and its bastard sibling, “folksonomy”.

A taxonomy is formal, top-down and rigid; a folksonomy is informal, bottom-up and flexible. Clearly, both have their uses – to identify books in a library one needs a common (and hence inflexible) system of classification – a taxonomy like the Dewey decimal classification, but to find books at home one might use a more manageable system (like/keep/recycle, perhaps).

The system you choose depends a lot on what your uses are.

In biological taxonomy, there is often a battle between lumpers and splitters: since the classification is a human construct (albeit one that aims to follow natural divisions), whether a population of organisms is a variety or subspecies or species is up for debate.

Splitters tend to argue that different types are separate; lumpers tend to minimise the differences. As an example, I used to work on bracken (many, many years ago!); back then, it was considered to be a single species with two sub-species and twelve varieties – all lumped together, in other words; I believed that now these had been split into two species and twelve sub-species – but I was wrong: the accepted taxonomy of bracken is that it consists of up to twelve separate species. In this case, the splitters have won.

Looking at my brother’s iPod, I saw that he is a splitter; I on the other hand seem to be a radical lumper. We classify the music on our players very differently, using the folksonomy. The genre field of the ipod database is where I noticed this. My brother uses hundreds of different genre – separating jazz, for instance, by time and location (“1960s West Coast Jazz”, “1970s UK Jazz”, and so on).

I use four different genre, reckoning that all my music can be classified by just jazz, rock, classical and folk. I tried to get it down to three, but however hard I tried I couldn’t pretend that a couple of the “folk” artists could be put into “rock”.

Whilst this reflects our outlooks – I would have problems trying to classify artists down to too much detail (would Dave Holland or John McLaughlin [warning – launches music!] be UK or US jazz? And is it jazz or fusion? Or jazz-rock?) – it also stems from the different ways we use our players.

I either know exactly what I want to listen to – in which case I will find the artist and the recording – or I play music on shuffle (by album), in which case I want a broad sample to choose from. I know the broad genre I’m interested in – jazz, rock or classical – and then I shuffle through albums until something grabs me.

My brother doesn’t shuffle at all: he uses the genre field to locate what he wants to listen to.

There are lessons in this familial divergence. The kind of classification used – whether a top-down taxonomy or a bottom-up or user-created folksonomy, and the fineness of the splitting or lumping – depends very much what the classification is going to be used for and who the audience is.

And coping with both a taxonomy and folksonomy (like iTunes does, through playlists) makes a lot of sense.

An Inordinate Fondness for Learning: building a taxonomy

I have been meaning for a while to write about one of the more interesting projects I’ve had; and a conversation I had last week with Janet Davies has finally prompted me to do.

We were talking about different ways of classifying things – documents and information, largely, but also – because of my background – plants.

This is everyday stuff for Janet, but it took me back five years or so when I was tasked to develop the classification – the taxonomy for the implementation of a new learning management system in a large bank.

The purpose of classifying each piece of learning (or learning object – it could be a book, a video, part of an online learning programme, a weblink – or a multiple of any of these) – was to help people find the bits of learning they needed for their role, and to help them plan their own development. Surprisingly, evidence showed most people liked to browse categories rather than search for what they wanted. The taxonomy provided the categories people could browse in.

Developing this taxonomy wasn’t the usual sort of project we worked on, and the programme specifically looked for someone who thought about things with a different perspective. They got me…

Whilst I was working in learning, I had a different background to many of my colleagues: for one thing I had a spent some years studying botany, and that included a fair bit of botanical classification. My view of classification was therefore formed through plant taxonomy and systematics, at a very fundamental level: I believed that the classification at a very fundamental, intrinsic level needed to make sense: it needed to be “natural”, to exist outside my creation of it. It should reflect the state of things.

When I have spoken to professional information managers – particularly librarians – they haven’t got this at all. Of course, the classification couldn’t exist outside of my creation of it – it didn’t exist before I startedmaking it. The desire for a “natural classification” stemmed completely from a biologists view of systematics, where organisms fall into related categories (albeit with some fuzziness around the edges). Whilst one knows through experience what organisms are related to others – what are monocot or dicot plants, or which animals are reptiles and which are mammals – there is also a lot of fuzziness, and there are long, erudite debates as to where specific organisms should fit. There are many ways to classify plants, say: by the shape of their leaves, by the form of their stem, by the colour of their flowers; but it doesn’t take a plant taxonomist to know that yellow roses are more closely related to red roses than they are to daffodils.

(Incidentally, there is a fascinating phylogentic tree in the window of the Wellcome Collection on Euston Road, showing the relationships between organisms in relation to, I believe, their numbers; what we think of as life on our planet – the plants and animals we are familar with – form the smallest part of this image, hugely outnumbered by the the various protists, prokaryotes and other life forms. We are biased towards organisms built on a similar scale to ourselves. God may have had an inordinate fondness for beetles, but he must have been absolutely crazy about bacteria and archaea. This one similarly emphasises plants’ and animals’ lack of importantance – they’re the single branches Chlorophyta and Animalia, respectively.)

Phylogenetic tree Used under Creative Commons licence

Within the learning taxonomy I was tasked with developing, there were no natural characteristics – the learning was only defined by our relationship to it: by our definition of it. But it was important that the classification held together outside of my construct of it. For instance, it would have been possibly – and very easy – to classify each piece of learning according to which bit of the organisation used it, but this was temporary: like most large organisations, the structure of this one changed frequently, and classifying learning in accordance with the organisation structure would have required frequent revisions. It would also have limited the accessibilty of learning objects: if they were seen to be classified as if belonging to one part of the organisation, others wouldn’t want to use them.

This was very counter-cultural, and went against the power-politics in the bank. Different parts of the organisation believed that they were more important than others, and should own more bits of learning. After I had finished my design work and moved on to other projects, the head of learning gave in to one division’s demand that the learning their staff used should be classified according their organisation structure. Two weeks later, this division was reorganised and merged with another, making a mockery of their revised classification, and making me feel smug and vindicated.

Designing the learning taxonomy was a complex task. There were tens of thousands of learning objects to classify, and each had to fit somewhere. The highest level of the classification had been agreed before I was brought into the project – there were seven key categories – but below each of those it was quite literally a blank sheet of paper.

That is what I started with: a blank sheet of paper. I drew mindmaps reflecting our knowledge about each of the major classifications, going deeper and deeper down. (I think the deepest level was six down – so each major category could have five layers of subcategory below it.) [And, as a prompt to myself, I really must write a post about mindmaps sometime…!]

I perceived each category or subcategory in two ways. Firstly, as a “bucket” into which learning could be placed – somewhere to hold the learning relative to other pieces of learning, where the relationships held together in a cohesive way. Secondly, as the “chapter headings” we would expect to see in a book about each topic: if someone were writing an overview of a particular subject, what would their chapter and section headings be?

Moving from one level down to the next creates the breadcrumb trail with which you’re probably familiar – most websites display their breadcrumb trail.

This was a one-off process: we wanted to make the classification as future proof as possible. Whilst every piece of learning had to be classified, there might be some classifications which would end up without any learning being assigned to them. For instance, the organisation functioned internationally, so there was one part of the taxonomy that featured foreign languages and culture. The subcategories in this section included major languages for countries in which we didn’t operate but where we might aspire to. It didn’t include every language in the world, though – nor even every country: just those that represented potential significant markets.

It was a major piece of work. The classification had nearly 2,000 different categories. (OK, 1,940, if you’re being picky…) All in all, it took several months. Once the classification had been developed, I had to classify each of those tens of thousands of learning objects, and then I had to teach the organisation’s learning consultants how to classify the new bits of learning they were developing. Quite quickly, by nature of my involvement, I became the bank’s resident expert on learning classification.

It was also never going to be finished: the classification would evolve as our understanding of learning – and the organisation – improved.

I loved this project. It had the intellectual fascination to keep me interested, and I learned so much about so many different subjects as I was doing it. It was also quite lonely – I worked largely on my own, with very limited collaboration; not my preferred status.

It wasn’t perfect – far from it. The project – indeed, the whole LMS implementation – was borne out of a need or desire to control. There was no space for a folksonomy: this was a top-down, centralised process.

But I did draw some lovely mindmaps…