Drupal, RDFa and the “fauxpository”

This post was originally published on the Application Profiles Support blog at UKOLN.

Drupal 7 is likely to be released soon, and will include native support for RDFa. The RDF module for Drupal 6 already allows this functionality. Why is this important? Because it makes relationships between resources much easier to describe through Drupal’s user-friendly interface and, in the process, would allow documents to be available as linked data.

In Drupal terminology, a “node” is effectively a metadata record, and various Drupal modules enable the easy customisation of metadata. In effect, you could build a repository on the basis of Drupal, by-passing the need for platform-specific knowledge tied to open source software that has increasingly moved towards the “enterprise solution” space, along with all of the technical tie-in that it usually entails. For the service provider, it is not dissimilar to the tie-in experienced with commercial software, especially in the case of information librarians or other professionals who are not developers, or even developers are not part of that particular open source development team.

Application Profiles are essentially structured metadata comprising elements and (usually) relationships, and are therefore inherently linked data solutions. They vary in complexity according to their particular functional requirements: for instance, in the world of scholarly publications, there is a spectrum between the straightforward, unstructured way that DSpace implements Dublin Core (which should perhaps be called the DSpace Application Profile), the simplified FRBR structure of the Scholarly Works Application Profile (SWAP) and the complex entity-relationship model of CERIF, the standard developed for Current Research Information Systems (CRISs). This latter is a de facto application profile, even if it is not normally referred to as such.

Why should Drupal be any better than the repository platforms that already exist? In many ways, it depends on what you need to do with it, and on the resources at your disposal. But the advantage is that Drupal is a flexible Content Management Framework that is designed to be leveraged for any sort of content, and for new modules to be designed easily for new purposes. After all, what does a repository actually do that other websites cannot? They put metadata records and bitstreams (the actual documents or files) on the Web, and add a few additional services such as OAI-PMH, SWORD and statistics. But repositories are only a particular specialised subset of content management systems. Drupal is accessible to any PHP developer without any initial requirement of particular specialist platform knowledge, which is relatively easy to obtain. The community is large and support is quite easily available, as are modules that can be adapted for local purposes. It is designed to be easy to customise and theme.

Sarah Currier recently talked about the idea of a “fauxpository”. If I remember correctly, she pointed out that it could even be based on WordPress. This is clearly a workable idea, although hardly suitable for production use as a university service. I would maintain that Drupal could easily be suitable for such a use with relatively little work, and could make use of and adapt application profiles in a way that the major open source repository platforms have been slow to do, and are still only just beginning to enable as something of an afterthought. UKOLN are investigating how Drupal can be used to make it possible to make use of the JISC’s Dublin Core Application Profiles (DCAPs), and using Drupal is intended to show how it can work independently of tie-in to any specific platforms.

The future of dispersed social networks?

4 Replies

I just saw this interesting video about Diaspora:

[vimeo http://vimeo.com/11099292]

In real human conversations, do you pass what you say on to a third party so that they can control it, format it and keep it before they agree to send it to the person you’re talking to? No, of course you don’t. Yes, there is CCTV, censorship (in some countries), and all sorts of covert monitoring, but you still talk to the person directly. You talk, write a letter, even an email, with absolute control over the content.

There are all sorts of providers: postmen, ISPs, email services, but no single data controller. Yet with social networks, Facebook or MySpace does just that. They even suggest whether you should re-connect with someone without giving you any real say over the nature of that friendship: it always annoys me! Like all real people, I have all sorts of relationships on many levels, and it should be up to me how to run them.

So, why not have a dispersed social network? It’s been tried before in various ways but there have been various sticking points. For instance, Elgg is a great open source social networking system, but it only works within one particular instance of the software. You can’t install multiple instances and make them talk to each other over the whole internet in a way that could rival Facebook. No reason why not in principle, except that it hasn’t been made for that. OpenSocial provides an API for various data hubs like hi5, LinkedIn, MySpace, Netlog, Ning, orkut, and Yahoo! to talk to each other. You could perhaps get Elgg and OpenSocial to work together, amongst other possibilities.

But four American kids have just got together to spend three months of their lives working full time on developing Diaspora. They seem to know how the Web works on a fundamental level. I hope that in three months time, they not only deliver the goods but actually get people to use it, so that it spreads in a viral way.

It’s not long ago since nobody was on Facebook. Since then it has peaked, even though people are still joining, and avid users are deserting it due to privacy mistakes on the part of Facebook, the way that they control people’s data and alter the service without notice. Twitter has suffered in a similar way. Remember when MySpace wasn’t just for bands and teenagers? Perhaps for some, it still isn’t. But I don’t meet many adults who aren’t musicians who really do much with MySpace. Web services can lose ground as quickly as they gain it, sometimes at breathtaking speed. Yet others persist for years before it eventually happens. Whatever happens, the Web is a dynamic place.

I’m not unaware of the irony that I’m posting this to precisely one of the big data hubs I’m talking about! [Ed.: now migrated to an independent server.] Not everyone wants to run their blog, social network or other service on their own server. But it’s important that if you want to, you can – and that whether you do or not, you should be able to talk to other people, whatever their choices may be. For instance, I can use wordpress.com, as here, or could equally download the software from wordpress.org. WordPress give me that choice.

In three months time, I hope to see Diaspora come to something. Maybe it will, maybe it won’t, but it’s an interesting venture by what seems to be an interesting group of young developers. The idea of giving people back control over their data, whether or not it takes off in the end, seems to me to be an inherently sound one. But I’m a socialist, as you may have gathered, so I believe in people voluntarily sharing things for a better society. That is what talking is all about, whatever your politics may be.

Mitthas – the fruit of patience is sweet

8 Replies

Sabr ka phal mitha hota hai (Urdu/Hindi saying)

“The fruit of patience is sweet”

Apparently the Mittha, or Mithta Nimbu (Citrus limettoides) is now known to be a cross of Sour Orange and Citron. The name just means Sweet (Lime). The second word can mean either Lime or Lemon. It’s known as the Indian Sweet Lime or the Palestinian Sweet Lime.

They are not the same as the Mediterranean fruit sometimes called Sweet Lime (Citrus limetta), with which they are often confused, although they don’t look or taste especially similar. The main thing is that neither is particularly acid.

Apparently it’s well known that children like it in particular, whereas its relative lack of acidity is less attractive to adults. It’s also sometimes said that it suits the Indian palate but not the Western one. I’ve been wondering since my grandmother picked some from a tree at my grandparents’ house in Model Town, Lahore, in the summer of 1987, what kind of fruit it is. Nobody but me liked them, but to her amusement I was mad about them. My dad and my aunt were always amused to tell the story. I remember that they taste something like lime and orange, but are much more palatable than limes.

Twenty-three years later, I wonder if I would still like them (as I’m sure my palate has changed a lot since then), and whether I could get hold of some here in Britain. But now, at last, I know what they are.

[Edit: changed spelling to Mittha on the suggestion of a comment below, probably a better romanisation than the above, 13 April 2011.]

Linked data and Dublin Core Application Profiles in EPrints 3.2.0

Leave a reply

This post was originally published on the Application Profiles Support blog at UKOLN.

EPrints 3.2.0 was released on 10th March 2010. It has some remarkable new features relating to linked data and, consequently, to Dublin Core Application Profiles based on multiple entity domain models such as SWAP, IAP and TBMAP (the GAP does not have a domain model). Here are the key points:

Linked Data Support

Ability to establish arbitrary relations between objects or provide additional metadata in triple form.

Semantic Web / Linked Data (RDF)

We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.

We’re rewriting it to do the same thing but with much less overhead!

However, as may be seen on the EPrints wiki, the latter section read as follows until 11th March 2010:

Semantic Web Support

RDF+XML Format
N3 Format
URIs for all objects, including non dataobjs. [sic] eg. Authors, Events, Locations.
BIBO Ontology
Extendable
URIs now use content negotiation to decide which export plugin to redirect to, based on mime-types supplied by plugins and the “accept” header.
Relations between eprints and documents

If this is understood on face value, it appears that there has been significant progress in enabling features that would allow the full implementation of the JISC’s DCAPs based on the simplified FRBR model, although we must wait for some important details until the promised version 3.2.1, which is to be released “soon after 3.2.0″ according to the statement above. Although objects may be described with “arbitrary relations” and “additional metadata” (additional to what?) can be described in triple form, there are not yet URIs for all entities, such as Authors and so on. Presumably, the support for BIBO would be more demanding that the support required for the cut-down version of FRBR as seen, for example, in SWAP.

This is all very promising, especially in the light of the same functionality being promised in DSpace 2.0, which were not yet implemented in the recent release of DSpace 1.6.0 [Ed.: press release no longer available]. However, all of this must come with the caveat that, until this is tried out in practice, it is not certain which levels of implementation are possible: clearly, the actual metadata fields can easily be adopted by any repository, but what about the relationships between entities, and the relationships with other complex objects? How exactly will these be implemented in practice? For the purposes of linked data, we also have to wait until EPrints 3.2.1 for metadata in the RDF+XML format.

To this end, although UKOLN cannot offer a publicly accessible test repository with user access, we hope wherever possible to implement and test these pieces of repository software for their usability with SWAP, IAP, TBMAP, GAP and DC-Ed in the first instance, since the majority of repositories in the UK HE sector use these platforms. Of course, we would also like to do the same with Fedora at some point in the future. However, if you have evidence of any such implementations, even for test purposes, and if you are happy for us to evaluate these, we would be very happy to hear from you.

Afghanistan and the North West Frontier

3 Replies

When I stayed at my aunt’s house recently, we talked about all things Punjabi, as we often do. At some point, I pointed out that in the 70s, rather like in Beirut in the Lebanon as well, Kabul had something of a reputation for being a magnet for westerners. The first Marks & Spencer’s in Central Asia was built there! She agreed that it had been quite unlike today, and coolly added the detail that she had gone there on a school trip once when she was young. You couldn’t imagine a world more different from today.

Since then, I have been reading A History of India by John Keay. His prose style is rather grandiose and meandering for a modern historian, I find, but it is quite enjoyable. It has been a while since I read much history that wasn’t either Celtic or linguistic, so it has been a welcome change. My interest, I must admit, comes in great part from my own ethnic origins as a Punjabi. Although I’ve never been to Afghanistan myself, I went to the rather prosaically named North West Frontier Province (NWFP – named of course by the British) in 1987 when I was ten. We travelled from Lahore to Rawalpindi (the old city beside the new Islamabad), on to Abbottabad, to the town of Balakot in the Kaghan Valley, and then to Naran village. My grandfather had a house in Ayubia, and we also went there. More recently, in 2005, the area was hit by a serious earthquake and Balakot was flattened. I remember dangling our legs in the river there, being careful not to get too close to the strong currents. It was not then an especially dangerous area to visit, despite being so close to the narrow strip of land that is Pakistani-controlled Azad Jammu and Kashmir (literally “Free Jammu and and Kashmir”, often shortened to just Azad Kashmir), separated from Indian-controlled Jammu and Kashmir by the 1972 Line of Control.

The Kunhar River in the Kaghan Valley near Naran

Why is this relevant to a Punjabi, you may ask? Well, apart from my Dad memorably realising he was able to speak Hindko, and the villagers assuring him that, whatever he thought he was speaking, it was definitely not in their opinion his native Punjabi despite the fact that he had spoken it as such all his life (no doubt in a funny accent, to them), the whole area is very much a mixing pot of related languages and peoples. Lahore has always been full of Kashmiris and Pathans over the centuries, among waves of other peoples from the north, and most families including ours have married into families who still pride themselves as Kashmiri. And Lahore was an important battle ground between the Durrani Empire of the Afghans and the Sikh Empire of the Maharaja Ranjit Singh, until the British Raj overtook the latter. In fact, I recently saw his surcoat and his shamshir (sword) in the Wallace Collection in Hertford House in London. [Note: it is not, I discover, a talwar, which is similar.] I’m rather surprised the Sikhs don’t want it back – on second thoughts, perhaps they do…! The mausoleum or Samadhi of Ranjit Singh, Sher-e-Punjab (the Lion of Punjab) is in Lahore, rather dilapidated these days: modern Muslim Punjab is perhaps unsurprisingly not very interested in it compared to the other monumental architecture of Lahore that we saw, such as the Badshahi Masjid (the largest mosque in the world until 1986, the year before my second visit to Lahore), Lahore Fort, and Shalimar Gardens.

Samadhi of Ranjit Singh

Badshahi Masjid (Mosque)

Pakistan itself has a name that was quite cunningly crafted from a partial acronym and yet has the same meaning in both Urdu and Persian, “the land of the pure”. The name and acronym were suggested by Choudhary Rahmat Ali, where the P is for Punjab, the A for Afghania, and the K for Kashmir (it was originally Pakstan but the letter I was inserted partly for euphony and partly for the river Indus). As well as being part of the suffix -stan “place, homeland”, the S is also for Sindh.

Major ethnic groups of Pakistan

The name Pakistan is thus a portmanteau word and a pun in two languages, which is impressive enough to make up for its shortcoming in failing to cover the Baloch of SE Pakistan, whose homeland is on both sides of the Iranian border, and who represent the large, mostly desert province of Balochistan within Pakistan. This is an undeniable slight, but one suspects it cannot be the only reason for unrest and attempted secession in the province in the 60s and 70s, that continues in some form today. In fact, some Baloch claim that the former princely state of Kalat, i.e. a significant part of Balochistan, was illegally annexed by Pakistan in 1947. The founder of the nation, Quaid-e-Azam, otherwise Muhammad Ali Jinnah, is the man that Baloch separatists blame. The Baloch are an Iranian people, like the Pathans. The slight formerly extended to East Bengal that became East Pakistan, and is now Bangladesh, divided from the other half of Bengal in India, just like the Punjab. As readers will know, that state did secede from Pakistan with the assistance of the Indian Army in 1971. The humiliating Pakistani surrender to Indian forces still hurts many Pakistanis. In my own case, the hurt is personal rather than political, however, and the loss of East Bengal seems otherwise like old history already.

So why Afghania? It has never been the formal name of any country or region, but it is occasionally applied informally to the NWFP and the FATA (Federally Administered Tribal Areas), which have always been dominated by Pathans. Pathans are an Eastern Iranian people, whereas the Hindko and Pothwars of NWFP are a North Indic speaking people, although the two have co-existed for centuries among various other ethnicities of the wider region. The reason is that FATA, NWFP (and even former parts of the latter that were absorbed into Punjab as late as 1970), were annexed by the British from to British India from Afghanistan. The Afghans – even apparently Hamid Karzai – have always held that the Durand line of 1893 has no legal basis despite further treaties with the British recognising it in 1919, 1921 and 1930, albeit under threat of force after the Third Afghan War. Interestingly, wider claims have been made that include Balochistan as well, which are perhaps related to Baloch nationalist aims.

Edit: Since 2010, the former NWFP has been known as Khyber Pakhtunkhwa – 13 April 2011.

Afghanistan before the Durand Line agreements

Pushtun territorial claims according to the United States

In no way do I mean to judge these disagreements for myself, except to sympathise with any ethnic group of the region that have felt oppressed or divided: international boundaries are rarely fair and are usually to the detriment of one or other party. They are wholly artificial, as people don’t settle the land so neatly in most parts of the world.

However, the Durrani Empire and the Sikh Empire are significant here. The Durrani are a Pathan (or Pashtun) tribe who are also partly descended from Alexander‘s Greek troops. At their greatest extent they controlled the Punjab (including much that is now in India, and the former parts of Punjab that are now Haryana and Himachal Pradesh) and even large parts of Balochistan. Ranjit Singh defeated the Durrani and wrested back control of Punjab and much of what is now NWFP and FATA. These have always been Muslim areas, though: even Lahore was 60% Muslim until the bilateral expulsions of 1947.

Punjab under the British

The divided Punjab

It strikes me that in many ways, the relationship of Pakistan and Afghanistan is an old and complex one. It is just as complex as the ironic, mutually belligerent relationship with India whereby the bulk of the Mughal mosques and palaces of the Muslims are in India, yet the river Indus after which Hind or India is named lies entirely within modern Pakistan. The Punjab is the key to all this. It is at once Indic, having more to do ethnically with the fertile centre of India and the Hindus and Sikh Punjabis now severed from the bulk of the province that lies in Pakistan, and at the same time Persianised and Muslim under influences from the west. West Punjab has always enjoyed a steady cultural influx of Iranian, Indic and other northern peoples over the centuries from across the Karakoram and Hindu Kush mountain ranges. The Durand line has never meant a lot to the Pathans (or Pashtuns) who it divides.

Artwork on a rock of the so-called Afghan Girl (Sharbat Gula) depicted on the June 1985 cover of National Geographic magazine.

One is tempted to compare the sparsely populated but physically expansive Afghanistan and the much more populous Punjab-Sindh rump of Pakistan with Scotland and England in the 18th century. (Perhaps the rebelliousness of the Baloch and the Welsh is a more tenuous comparison!) The Highlanders, like the Afghans, were warlike tribes who governed themselves, yet periodically invaded the southern lands from time to time. Up in the Himalaya, in the Karakoram and Hindu Kush ranges, the mix of peoples peoples are either more or less Persians (Iranians such as the Dari, Pathans and Tajik) or else highly persianised Indic peoples, and virtually all are Muslims. It is not surprising then that the last king, Mohammed Zahir Shah of Afghanistan seriously discussed a merger [Ed.: link no longer available] with Pakistan – with himself as head of state! Equally, the latter have always found it in their interests to recognise the porous nature of the border and thus their influence across it. I would be surprised if the issue is not raised again, if and when Afghanistan becomes relatively more stable again. In reality, though, it would be more realistic to describe such a merger as Pakistan absorbing Afghanistan rather than the other way round, given their respective populations. Already the old doubts are being encouraged [Ed.: link no longer available] about the legality of the border. The stalwart Pathans (of whom twice as many live in Pakistan as in Afghanistan) are the backbone of the Pakistani Army, and Pakistan is strategically weak against India. Perhaps it would then stand some chance of wresting back control of Srinagar and the Muslim Kashmir Valley, though I doubt that majority Hindu Jammu or even Ladakh, split between Buddhists and Shia Muslims, will ever be regained. There is of course one major flaw in my comparison: Afghanistan is much bigger, more varied, and far higher up in the mountains than the Highlands of Scotland.

Topography of Afghanistan

Two Breton etymologies

2 Replies

[References to be added; comments and corrections will be acknowledged.]

NB: this article is pending serious revision. The present version will be kept available for archival purposes but there are several serious etymological considerations why Romani is a doubtful source for B. paotr, and why an Alanic origin has apparently insurmountable phonological difficulties. Whitley Stokes presented a case for an Old French origin *paltr > pautre but further research is required into the validity of its supposed Germanic origin. I will propose that it could instead have been borrowed into Old French from Domari during the Crusades, following my original theory, as there remained until modern times a so-called “Gypsy” (i.e. Dom) quarter in Jerusalem. It could then have been passed to Breton, although there may be some slight chance it was also borrowed directly by a Breton contingent in the French army. Domari is now seriously endangered. Domari and Romani are related, but not perhaps as closely as was formerly believed.

New changes are now noted in bold.

Words in the three Brythonic languages that start with the letter <p> always arouse my interest when it seems clear that they are neither (1) words that had an initial /k^w/ in Common Celtic that became /p/ in certain dialects including Brittonic, e.g. Ir. ceann, WCB. pen(n) < Britt. *k^wenn, nor (2) Latin or other loan words. This is because original /p/ in Celtic was lost, e.g. Ir. íasc “fish”, W. ŵysg “Usk” (river), W. pysg “fish” (loan) < L. piscis. As a consequence, they tend to stick out like a sore thumb as loan words of one origin or another. I realised in 2002 that two words in Breton seem to fall into this category, and wrote a highly speculative piece in Keleier Breizh / Newyddion Llydaw, vol. 32 (August 2002), about the first of these two words.

Paotr “boy”

This Breton word normally means “boy” or “lad” and after a possessive can sometimes mean “son”, which is to say that ma faotr can be effectively equivalent to ma mab; lastly it can mean just “man”, especially in the plural paotred, where it is used as the plural of den where this means “man” rather than “person”. (The words gour and gwaz are relatively rarely used to mean simply “man” in the general, everyday sense.) It is, of course, an extremely commonly used word in everyday speech.

As well as the problem mentioned above, this word also contains the medial consonant group /tr/, which should always otherwise have been voiced to /dr/ in native words. That it has not is a sure sign that the word is a loan word, after lenition had already occurred.

As an ethnic Punjabi who speaks Breton, Welsh and Cornish (but not Punjabi or Urdu except for a few isolated words), a strange realisation came to me that I think very few people would have been in the position to come to. Now, I’m not in any way trying to suggest that there are any Punjabi loan words in Breton! That would be astonishing, not to say surreal: taking the self-effacing Indo-Pakistani joke about the older generation saying that “everything is from India” just a little too far! What I noticed was that there is a strong similarity, nonetheless, with the Punjabi word putr “son”, “boy”. (The Urdu/Hindi word is beta, and my experience is that it tends to be spoken by Punjabis in a less emotionally charged manner.) My father, when he was calling to me or my brother affectionately in his usual, expansive style – usually from the other side of the house – would call out our names, pause, and then loudly call mera putr! “my son” when he felt that we hadn’t answered suitably instantaneously. At other times he would use it as an affectionate term in a slightly less loud way, out of love, pride or both.

In my short speculative article “Ur ger brezhoneg eus Brezelioù ar Groaz”, my suggestion was that the word might have somehow been borrowed from a language related to the North Indic group of Indo-Aryan languages during the Crusades. In hindsight – I was only 25! – this was an obvious and clear mistake, since the mechanism by which it could have reached Breton would be very far-fetched. By which language, perhaps no longer extant and inexplicably removed from North India, did it get transferred? [Editor’s subsequent note: there remains a so-called “Gypsy” quarter in Jerusalem, and in consideration of this point, it is no longer to be considered at all far-fetched that it could reach Breton, probably via Old French.] Very soon after I published this, my friend Matthew Spikes in Aberystwyth answered this very simply and effectively: through one of the dialects of Romani. I should have realised thought of this myself, of course, and have been kicking myself ever since!

As you may know, Romani (or Romany) is considered to have travelled from India to Europe via the an area in which it is still most common, Romania, Bulgaria, and the Balkans, at some time prior to the invasion of the Greek Byzantine Empire in Anatolia (modern Turkey) by the Ottoman Turks, who brought Islam to Turkey. It is not, needless to say, related to Romanian, which is a Romance language descended from Latin, perhaps incorporating a Dacian substrate and certainly being affected by neighbouring languages such as Greek, Slavic, Turkish and Hungarian. The similarity of their names is coincidental, because Romani derives from their word Rom “a Romani man”, and the confusion remains a source of political and ethnic strife between the two. (Of course, the name Romania is derived from L. romanus “a roman”.) The reason that Romani cannot have not travelled to Europe after this time is the lack of loanwords from Turkish or Arabic, whereas it is my understanding that it has Persian, Armenian and Greek loanwords, amongst others. Some Romani groups are called Sinti, which appears to recall the name of the Sindhi language, which in turn means more or less “of the river Indus”. This form with initial /s/ is the native form of the river name and the associated province Sindh, where my father was born and where he lived as a boy in Sukkur, whereas the Urdu/Hindi/Hindustani word Hind “India” and the derived words Hindi and Hindu that have been borrowed into English are from the Iranian form with initial /h/. [The name Sinti may only be coincidentally similar? To be discussed.]

I must add the disclaimer that I am not, by any means, an expert in Indic languages, and all of my comments here must be considered in the light of that understanding. My knowledge is either familial or has been acquired in a rather piecemeal manner.

It must be said that the word for “boy” is not now putr in at least the majority of Romani dialects, if any. It must be said that the word for “boy” is not putr in all Romani dialects. [Editor’s later note: there may be no such word in modern Romani: did it exist in early Romani, as it does in Domari?] There are forms similar to chavo, from which it is said we borrowed the English chav, which has recently come to have a rather tawdry connotation. Romani dialects are highly diverse, having spread out over Europe some considerable time ago. I don’t know enough to say for sure, from my limited reading, but perhaps it might be fair to say that it is surprising that they are not more diverse than they are.

There is also a so-called “archaic” (i.e. conservative) Indic language called Domari that spread out from India in a similar way to Romani, but apparently never made it to Europe from the Middle East, as well as the Jakati language (also Jataki, Jati) spoken in parts of North India, Afghanistan and – surprisingly far away! – in the Ukraine. These are also very diverse, and are split into various distinct groups. I’m not sure what their word for “son” is, as there seems to be very little information easily available about them. The Domari word for “son” is also putr, which makes it a candidate for the Crusader theory, although the mechanism for its transfer to Breton is still rather far-fetched. The case is even weaker for the Jats, who it seems certain were never present in the Middle East in any significant numbers and, I think, are a sufficiently unlikely source to be safely discounted, given their geographic and historical distribution. [Need to clarify about the different kinds of Jats and the relationship to the language name. Point out that the Ukraine is separated from the Middle East by the Black Sea and Anatolia, and that it equally provides no direct source for W. Europe.]

Before this theory leaves the realm of speculation, however, it will first need to be demonstrated that the kind of Romani spoken historically in Brittany, and more widely in France and in Western Europe, included the word in a form that is compatible with the Breton word as we have it. To my knowledge, no other etymology for paotr has been suggested. If anyone has a better one and this is all simply coincidence, I will happily eat my words. [NB Whitley Stokes suggested OFr. *paltr > pautre – to be discussed in further post. Thanks to Guto Rhys for reminding me of this.]

[8/1/2010] Alex Woolf of the University of St Andrews points out to me that there were considerable settlements of Alans, an East Iranian people from whom the Ossetians are descended, in Armorica during the 5th and 6th centuries, the same period that it was settled by Brythonic speakers from Great Britain. Given Avestan and Sanskrit putra, any Indic or Iranian language that was in direct contact with Armorica in this period needs to be carefully considered as a possible alternative source. There are place-names in Brittany that contain the common name Alan, which seems to show their significant influence during the period. Clearly, further work will be required to clarify this.

[9/1/2010] Prof. Nicholas Sims-Williams has informed me of two early Alanic sound changes that seem to make this hypothesis rather dubious, though not wholly impossible. It is difficult to see that any other Iranian language can provide a better origin theory, unless there are any other undocumented Iranian groups that settled in Armorica like the Alans. It seems unlikely that the Romans would have failed to comment on this in the period in question. [To be discussed in further post.]

Plac’h “girl”

This word is the equivalent of the above, the common word for “girl” and occasionally used to mean “daughter”, especially after a possessive, e.g. ma flac’h “my daughter”. The plural plac’hed can occasionally be substituted for merc’hed, which normally means “daughters”, where both of these can also convey the sense “women”. The latter is comparable to the identical use in Welsh, where merched means “daughters” and “women”. However, the word plac’h starts with <p>, is not a Latin loan, and does not appear to be a plausible native word for the reasons set out at the beginning of this post.

What, then, is the origin of this word? I suggest that, during the period in which there were substantial Irish settlements in Wales and Cornwall and probably sporadic ones in Brittany, the word was borrowed from OIr. fracc “wife”, “woman”, whose Brythonic cognates are W. gwrach, B. gwrac’h, C. gwragh “witch”, “hag” (also the related forms W. gwraig, B. gwreg, C. gwrek “wife”, “woman”, “lady”). The initial <p> would then be a hypercorrect de-spirantisation based on the phrase OIr. mo fracc > B. ma flac’h, where the lateral liquid consonant /l/ has replaced the similar liquid rhotic /r/.

The word plac’h is unusual in Breton, and in the Brythonic languages, because it is the only feminine word that is not lenited after the definite article, hence ar plac’h “the girl” is correct despite breaking the usual rule, and adjectives are frequently unlenited even where the sandhi rules would not normally prevent it. This is effectively optional, possibly because it has been sporadically restored by analogy. I suggest that this is strong circumstantial evidence for the etymology, as lenited **ar blac’h has never been created by a further process of analogy.

Again, to the best of my knowledge no etymology has ever been found for this word. If anyone presents a better one, again, I will happily eat my words.

[8/10/2010] Dr Simon Rodway points out to me that the Brythonic change /kk/ > /x/ probably preceded the Goidelic change /w/ > /f/ in *wracc > *fracc (although these changes happened at a similar time and occurred quite gradually). If so, this might present a small problem, but I will provisionally and tentatively suggest that, since the only similar final consonants in Brythonic were by this time voiced /g/ and spirant /x/ and the Irish long (or double) /kk/ would have been a strange sound to their ears, the latter might have been a more obvious choice as a sound substitution. The alternative /g/ is distinctively different, being a voiced sound, which could explain why we do not see Br. **frag instead. Although I initially wondered if ongoing contact between speakers of the two language groups might create a long-term awareness that Br. /x/ and OIr. /kk/ were equivalent, the problem is that this could equally be argued for initial Br. /gw/ versus initial Goid. /w/ > /f/, which would instead lead to Br. gwrach being used in this sense, undermining the very motivation for the loan. It seems better to seek better defined mechanisms such as sound substitution and specific processes of analogy.

Clearly, further supporting evidence needs to be brought to bear on this example, both on the above points and the substitution of /l/ for /r/.

Scope of this blog #2

4 Replies

I’ve realised that this blog should really reflect all of my interests and that I shouldn’t go on using it for work purposes. It will now also cover my interests in minority languages, and specifically in Cornish, Breton and Welsh, historical phonology, in addition to my personal views on popular science and communication technologies. In fact, while it’s not going to include details of my life as such, it will reflect my interests more broadly than it has before and cover all sorts of topics.

I hope that will encourage me to update it more often, and that consequently it will become more interesting. I’ve decided to re-name the blog “The Indo-European” at least in part because of my mixed Anglo-Punjabi parentage, along with my linguistic interests.

Complexity: perceived or real?

Leave a reply

This post was originally published on the Application Profiles Support blog at UKOLN.

One of the anecdotal remarks that is said a lot about SWAP in particular, but also as a general opinion about the JISC DCAPs, is that they are based on domain models that are too complex. But too complex for what?

Too complex to fit with how real repositories work?

Too complex to create usable input forms?

Too complex for users to understand? Do we mean real end users, or service providers such as repository managers? Do we mean anybody who is using the web forms to input metadata about digital objects, both content providers (who may also be users of the content) and repository managers?

It seems that there is an evidential problem innate in all of these assertions. It’s also worth remembering that not all resource types, and hence application profiles, are equal in this regard – nor are all users, content providers and repository managers. It’s also fair to say that sufficient work has not yet been done on investigating interface design and usability to be able to say for certain that the complexity of a data model necessarily makes the input forms difficult to use. There is an aspect of back-end software design to this question as well: the input forms may very well be simplified if the software can intelligently suggest relationships for the user to agree or reject, and generate as much of the record behind the scenes as possible.

Current work at UKOLN is aimed at solving these evidential problems by providing a methodology for investigating the best way to construct metadata

I can’t unequivocally answer the question of whether the JISC DCAPs have too complex data models to fit with the way that the most common repository platforms organise their records. It appears, however, that DSpace 1.5 does not yet support entity-relationships models, and that EPrints has its own data model. However, the use of DCAPs as exchange formats has already been shown to be a fruitful alternative approach, as EPrints has already got a SWAP export plug-in to do this. It is generally asserted that Fedora can already handle any data model. It is for the repository platform developers, ultimately, to provide the final answer to these questions. It’s clear that a lot of work is going on to address some of these issues. For example, it has been said that DSpace 2.0 will support entity-relationship models.

What I can say, however, is that the inability to support a back-end entity-relationship model does not by any means restrict a particular software platform from making use of an application profile, although there may well be a considerable demand in terms of development time in making the necessary functionality available. This is because there is clearly another alternative, namely emulating the entity-relationship model. To begin to understand this possibility, it’s necessary to take a close look at how the JISC DCAPs have been constructed, and the different classes of metadata that you find within them:

metadata about the digital object(s) themselves, i.e. the usual stuff in any repository
metadata elements relating to the semantic relationships between entities, i.e. isExpressedAs, IsManifestedAs, IsAvailableAs and variations thereof. These exist purely for the sake of the particular model that has been chosen, here a reduced form of FRBR. It is interesting that the dc:creator field, which is “real” metadata about the object, is the only link to the Agent identity, which may be seen, from the perspective of the object, as an entity that exists to express more detail about an item of metadata describing the object itself. In fact, it is an independent entity that could relate to multiple unrelated objects, of course.
identifiers: these are specific to the repository instance and application profile in question. Of course, all digital objects on the Web require at least one URI to identify them (in practice, nearly always a URL that also locates them). However, the entity model required by the application profile, if it isn’t emulated as described hereafter, cuts up a compound digital object in such a way that it is possible to apply further identifiers to each entity as discreet metadata records.

It must be said that this is NOT the only way to do relationships between digital objects. It’s perfectly possible to use, for example, OAI-ORE (or plain RDF) resource maps in place of the second of these types of “metadata” here. In fact, they are really not metadata about the object at all, because they describe the relationships of different parts of that metadata to each other: so they are really meta-metadata! It could be said that identifiers don’t describe the actual objects either, merely locate their metadata descriptions, so they are also meta-metadata. Change the way you do the modelling, and the meta-metadata may change – however, this is NOT true of the “real” metadata (title, author, image size etc) that describe the object itself.

Scope of this blog #1

Leave a reply

[Edit] This statement was replaced by a newer one which links to the About page. This is no longer a work blog and is written in my private time.

This blog was formerly called “Talat’s Repositories Blog”. The new name reflects a widening of its scope, not any loss of interest in the original subject area.

Although my current work centres around Application Profiles, working for UKOLN has made it increasingly obvious to me that information science has always been, and must remain, an interdisciplinary area where it is counter-productive to limit oneself purely to a narrow area of research and development. If services and standards are developed without reference to the wider dynamic, user-facing world of information science, they will be likely to place limitations on their long-term useability and life cycle in a given community. As far as metadata developments are concerned, systems other than repositories such as Current Research Information Systems and publications systems are arguably of equal importance, if not greater, to those to which we would usually accord the title “repositories”.

In re-branding and re-purposing this blog slightly, then, the intention is to maintain a focus on repositories and associated systems but also to keep an eye on wider developments in the dissemination of intellectual communications on the Web, through whatever technology may be relevant to its purpose.

DuraSpace

Leave a reply

[Edit] This is a legacy post. This blog is no longer connected with my work activities in any way and is written entirely in my own private time.

The recent announcement of the merger [Ed.: press release deleted] of DSpace and Fedora Commons as DuraSpace is potentially a very significant advance in the repositories sector. Although the two platforms will continue to exist as separate entities, they will no doubt collaborate to their mutual benefit in technical development. In addition, new software products such as DuraCloud are described.

I have personally found DSpace to be an effective and flexible platform, although until version 1.5 it was missing some fundamental functionality that meant it was overall an inferior product to EPrints. However, I have always said that if certain issues were sorted out, such as a more granular permissions system, versioning and so on, it is otherwise as good overall as EPrints. (I particularly appreciate how easy it is to see full metadata records in DSpace, from the point of view of research, though this is an entirely trivial technical point – it just happens to suit my work in my present and previous post!)

DSpace is clearly second to EPrints in terms of market penetration, but it is the only other major competitor to enjoy such a sizeable market share. Fedora is third not on its technical merits but largely because it is not a “packaged” product and requires much more customisation. It is evident that both platforms have much to gain from the collaboration. I would bet that the EPrints people may well have cause to worry about their future market dominance, given this development.

I’m particularly interested because Fedora makes much greater use of RDF, a technology that has its supporters and detractors, but has not been the basis for a wholesale change to the promised Semantic Web that might have been hoped. However, one can see the potential application within content management systems such as repositories. One stumbling block seems to be that triple stores are not particularly efficient databases and need significant optimisation efforts before they rival traditional relational databases, a subject on which I am not a great expert at present. (I thank a colleague at UKOLN for educating me on this.) It is particularly interesting, then, to note the reference to efforts to improve the triplestore-based storage layer Mulgara.

I’m awaiting further developments with considerable interest, noting the new version DSpace 1.5.2 and recent references to the planned versions 1.6 and 2.0. I wonder how much the repositories community will have changed in a year’s time? Things seem to be moving fast right now.