Microservices in (and beyond) Research Information Management

This post was originally published on the Technical Foundations web site at UKOLN.

Microservices: are they all that new?

Recently there has been something of a revival of interest in a small-scale development approach towards software design for repositories: microservices. This is far from an entirely new idea but seems to have been somewhat slow to develop in practice, even to date; a useful summary of the approach was given by Neil Jacobs back in 2010. Moreover, a modular approach towards software that fulfils various related functions in managing web content related to research clearly has a much longer history, and is not in itself particularly surprising in software development more broadly. However, it seems that microservices as an approach is gradually acquiring a clearer identity within this space, so it may be worth taking a look back at the nature of the types of software used in managing research content of various types, how they are related, and whether and to what extent terms like “repository”, “Current Research Information System”, “Research Information Management system” and so forth overlap in terms of software functionality that they offer.

Defining terms: “repository”, CRIS, RIM etc

Institutions within Higher Education are often faced with questions of procurement such as technical suitability and sustainable technical support. Although these areas are broader than those normally covered by the Technical Foundations web site, since they encompass non-technical considerations related to funding, policy and practice that drive software acquisition in universities and related institutions, the purely technical aspects are securely within scope and of considerable interest to the community at large in terms of developing useful technical guidance.

The question “What is a repository?” is likely to have a range of possible answers, but Neil Jacobs noted the revival of an approach summarised in Cliff Lynch’s 2007 description of the institutional repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members”. Without reiterating the points made by Neil Jacobs in detail, suffice it to say that these efforts have been led by institutions such as the California Digital Library and notably by John Kunze and others. The difficulty with this approach in general is not a purely technical one but one of technical resources, and it is not unique to the microservices approach but can for example be seen with systems such as Fedora Commons as well.

Software development approaches

While the most modular, customisable and flexible technical approaches are often able to be adapted most quickly (and arguably most effectively) to the challenging technical demands placed on them, it is usually the case that significant development resources, usually in-house, are required in order to tailor the software to local requirements. In practice, the result is often that only certain large institutions are able to justify and support software systems such as Fedora or even “roll their own” local software solutions. A useful example is the eSciDoc suite of services, developed by the Max Planck Foundation and FIZ Karlsruhe. Together, these effectively represent what in other contexts (e.g. the Linux world) might be called a “distribution”, in this case based on Fedora. It is also worth noting that these services have been developed so that they can be used independently of eSciDoc, for example with DSpace or another repository system. In this way, true to Cliff Lynch’s definition, each aspect of what together we call a “repository” is handled by a different piece of software, which then interoperates with a range of other web services according to local requirements.

“Does it do more than we already do?”

This, in a nutshell, is the microservices approach. However, there is no reason why the question should be restricted to repositories, since “repository” is itself something of a catch-all term for a class of web content services that are by no means identical in their principal functions and aims, even where they are using the same underlying software. Where, for instance, does the functionality of a repository end and that of a research management system, research information management system or Current Research Information System begin? Without a clear understanding of what these systems do, it is possible if not likely that higher education institutions, especially where decisions about procurement could be made by relatively non-technical managers, might easily end up acquiring more than one system with overlapping functions. Clearly, in times of difficult financial circumstances, this ought to be avoided wherever possible. It is worth spelling out what exactly different systems do in order to minimise duplication of effort.

Similar software issues facing HEIs

The question need not be limited to repositories and research information management either, although it is not the intention to get into great detail in this particular blog post. For example, libraries are frequently offered new products either by vendors with whom they have existing contracts or by their rivals. It is always in the interests of a vendor to sell a new product, so the question of duplication of technical functionality and/or the most effective technology to address a local need is of far more pressing concern to the institution than the vendor. A range of commercial library portals are on offer, built on but extending the functionality of library catalogues and commercial publications databases related to e-journals such as Web of Science. It is a common experience amongst library staff to feel unsure to what extent new software is offering new functionality, how it fits their technical requirements, and to what extent it may be re-packaging existing functionality in new clothes. The same could perhaps be said, for example, of systems relating to human resources or institutional finance offices.

What else can these systems do?

Returning to repositories and research information management, it is clear that a wide range of resource types are being published on the web through a range of related systems. The best recognised use of the repository is as a research publications repository, which is unsually how the wider term “institutional repository” is understood within the context of higher education and issues relating to but not confined to Open Access. Increasingly, attention has turned to Current Research Information Systems, based on the CERIF standard, and similar research information systems. Of particular interest is the RMAS approach, effectively building such a system from a range of related pieces of software, i.e. a microservices approach outside the limits of the repository sphere. Research information management covers all aspects of the processes of research creation and dissemination, including research reporting, human resources, finance and publication, while publications repositories commonly focus only on the last of these. This is usually the area where institutions operate systems whose functionality overlaps, as there is no reason in principle why a CRIS, for example, cannot expose research publications on the Web: this is possible with the main commercial systems such as PURE and Converis, for example.

In any case, there is no necessary limitation on the term “repository” to cover only resources relating to the outputs of research. Teaching and learning materials, amongst a wider range of educational resources, are another major area that has seen substantial growth in the last two or three years. Various types of media resources from images to time-based media such as audio and video recordings are found in institutional repositories for a number of different academic purposes, e.g. art collections, media archives, music collections, health information and so on, not all of which are the direct products of either research or teaching but may be connected with one or both. In this context, it is as well to remember that the term “repository” means little more in essence than “organised place or system to put something [on the Web]” and that many such systems, especially older ones, have always been known as “digital archives”, “electronic libraries”, “media collections” and so on, in contexts where the word “repository” would still not generally be recognised. Large data collections are often stored in systems that are, in effect, repositories, but whose development has been through systems not normally known by that term.

Solutions that fit problems

In summary, dividing the world of software systems in academic and related outputs too rigidly into “repositories” and “research information systems” may be at the root of much of the difficulties that may arise in understanding which technical functionality is required for any given local purpose and the extent to which systems overlap. A better, more precise understanding of these functionalities would help to avoid unnecessary duplication of effort and proliferation of systems. Some approaches are effectively bundled within one piece of software for a particular purpose, e.g. DSpace and EPrints in the repositories space. These offer a conventional set of services that fit the requirements of most institutions but may place some limits on the ability to customise those services indefinitely. Even these systems are built to be general purpose systems with considerable potential for local customisation. However, there is the tendency seen elsewhere (for instance in open source software with a large and disparate user base) to introduce software bloat: more and more functionality, some of it never used by the majority of implementations, is shipped with each succeeding version as new scenarios are met with.

While potentially introducing the problem of sufficient availability and sustainability of technical development effort, microservices are the opposite end of this spectrum. Each service is ideally a separate entity on the web server, built for maximum interoperability with the other services that may be required for local purposes. Rather than acting as plug-ins to a base software system (which is perhaps an intermediate approach), these are separate code bases able to run independently, even where they may have been intended, as in RMAS or eSciDoc, to be used frequently together. The technical issues and demands of each system will be different in every case.

The business of unique identification

1 Reply

This post was originally published on the Technical Foundations web site at UKOLN.

What need is there for unique identifiers?

Put in relatively non-technical language, there is an increasing concern in information science in general to uniquely identify different things, organisations or people that could otherwise be confused, whether on the Internet or in the physical world. In technical terms, these are all referred to as resources (even if people might find it vaguely demeaning in normal language to be considered as such). This need, whether real or perceived in any particular context, has grown as the complexity of information available on the Web has grown almost exponentially, increasing the potential for confusing similar resources.

Why aren’t names good enough?

1. People

It is not necessarily enough to have a name, since even a relatively unusual combination of names might easily not be entirely unique from a worldwide or even universal perspective: at the basic level, John Steven Smith might be unique in a place called Barton but even if you cross-reference these references, two people with the same name could easily be confused, for example if there are several possible places called Barton.

My own name, Talat Zafar Chaudhri, might appear to be more unique until you realise that these are all fairly common names in the Indian subcontinent and thus in the Indo-Pakistani diaspora, so it is reasonably possible or even fairly likely that another named individual exists with this particular choice of spelling (of which others may exist). I am also Talat Chaudhri, T. Chaudhri, T Chaudhri, T.Z. Chaudhri, TZ Chaudhri and similar variations (with or without spaces and punctuation) that might make it harder to decide which individuals to reconcile as a single individual, especially by machine processing. At least I do not vary the spelling of my surname, but some people may, especially in cases such as my own where other transliterations could be possible: for example, my father previously used the spelling Chaudhry and many others such as Chaudry, Chowdhary and Chowdhuri are equally possible. I understand when companies misspell it, but a computer might not be sure if these were definitely the same person, even if it went to the lengths of calculating a probability for this.

Moreover, people change personal titles (e.g. I have been both a Mr and a Dr and I am occasionally still referred to as the former by companies that do not allow for the latter option); they have multiple, changing work roles and work places, and may be known in multiple contexts, e.g. work, social, voluntary roles and similar. At work, one may have additional roles in various professional bodies, so it may not be apparent who is who. Two people might have the same name in a large professional group, e.g. physicists, and may even produce outputs related to the same subject. Who owns which ones? This is a particular issue for electronically available outputs on the Internet, e.g. publications, educational resources, audio, visual or audiovisual resources and so on.

2. Organisations

The same issue arises for organisations. Can we be sure that a Board of Licencing Control is unique? No. Perhaps it is merely another spelling for the Board of Licensing Control but using a different spelling? What if one, but not all, of these were re-named as Burundian Licencing Control? What if the Board of Licencing Control merged with the Department for Regulatory Affairs under either of these names, a combination, or an entirely new name, yet continued their association with the assets of the originals. De-mergers are likewise possible, and may present issues of uncertain ownership of resources.

Perhaps there are organisations with this name in several countries but serving utterly different purposes, and perhaps one is merely one possible translation of a term into English but used natively in another language. Historical names have been used in multiple contexts that may still be valid, e.g. the Irish Volunteers, and these might need to be kept clearly separate from each other. Conversely, there are also organisations that have multiple names or forms of names, whether in one language or in multiple languages or during their history, e.g. Óglaigh na hÉireann is Irish for both the terrorist Irish Republican Army (IRA) and most of its subsequent splinter groups but is also, however, an acceptable name, for historical reasons, for the Defence Forces of the Republic of Ireland, and previously just the Irish Army (an tArm) that now forms a part of it. These are clearly not the same and must be distinguished. It must be also noted that typographical constraints and character encodings will lead to yet more duplicate forms.

Isn’t this bigger than the question of unique identification?

Yes, the need for complex metadata to express these things can go far beyond merely identifying resources in a unique manner. However, before one can even start thinking about complex descriptive and relational metadata, one first has to be clear which resource is mentioned: hence the first step must be unique identification of what it is we are talking about. Only once we have done that can we feel reasonably confident about talking about how resources relate to one another and how they may have changed over time.

Overall, there is an ever increasing need to make clear what is meant, as more and more things and agents have on-line identities that need to be distinguished, whether this is as an owner of resources or as a referrant within a resource, e.g. the subject of the resource in a particular context, and even of the role played and the relationship to other resources or agents, perhaps in a specific time period. Information models can quickly become extremely complex, and this is certainly true where identity is concerned.

What is an identifier?

In concept, an identifier is similar in its basic concept to a name. At its most basic, an identifier in the context of an information system is a token (usually a number or a string of characters) used to refer to an entity (anything which can be referred to). Identifiers are fundamental to most, if not all, information systems. As the global network of information systems evolves, identifiers take on a greater significance. And as the Web becomes more ‘machine readable’, it becomes vital for all organisations who publish Internet resources to adopt well-managed strategies for creating, maintaining and consistently using identifiers to refer to those assets it cares about.

What are unique identifiers?

The simple answer is that this is the only way to avoid misidentification confidently, and therefore prevent any errors about ownership or rights over resources that might arise, as well as making sure that large bodies of resources contain reliable information generally.

The fundamental question is whether the identifier or token that has been chosen is unique and how best to ensure this. Some identifiers are so complex that mathematical probability makes them effectively unique in the universe, notably UUIDs. In essence, a UUID is no more than a complex numerical token: it is only additional complexity (and thus uniqueness) that it offers compared to, for example, a running number. Others like names can only be distinguished unambiguously by making a series of statements about which names are considered equivalent, which contexts (e.g. a person’s work or town) are valid, and so on, where a number of relationships have to be attached to a particular identifier and checked in order to reach an acceptable level of uniqueness and to eliminate any mistaken connections with resources that might be similar in name or perhaps also in other respects by chance.

The problem with UUIDs is that, while the chances of them failing to be unique are, to all practical purposes, non-existent, it is not very clear from a UUID alone what the nature of that resource is. It may be machine-readable but it says nothing about who generated that identifier and when, or which other identifiers might exist for the same resource in different systems that also generated an identifier for the same resource. Consequently, the need to associate other metadata with any complex number or other similar token remains (including but not limited to UUIDs). Simply, no single token can be sufficient for any complex purpose and, at the very least, an electronic or physical resource must be referenced for the token to have any useful meaning at all.

This is effectively that a URL is: another type of token. While I will not go into the whole discussion about URLs and URNs as sub-types of URIs, it is worth noting that, in many quarters, the term URL is no longer preferred despite it being the most commonly used in practice. In strict terms, there is a clear difference: while a URI is usually resolvable to an electronic resource, which may be either a description of a physical or electronic resource or may be an electronic resource itself, there is technically no requirement that a URI should be resolvable, i.e. that all it needs to be is a token that doesn’t necessarily have to represent an address that actually delivers a resource. However, it is usual to use the HTTP scheme, which is designed for delivering such a resource, so it would be somewhat eccentric and misleading if one were deliberately to choose an ostensibly resolvable syntax that does not in fact resolve. In effect, virtually all such URIs are also URLs (unless a resource has become unavailable and link rot has set in), since the latter must locate the resource or representation of it: this is inherently useful. Any URI that resolves, i.e. URL, will be effectively unique within the standard Domain Name System (DNS). As a result, there is no absolute need for UUIDs in many contexts, since a sufficiently unique and practical token already exists in the URI. Any unique but arbitrary token serves the core purpose here.

Aren’t identifiers really just names?

Yes and no. Names are intrinsically arbitrary too when they are first given. However, they are identifiable on a number of levels from a human perspective. In addition to a combination of names belonging to one or more particular linguistic and/or ethnic origins and usually identifying gender, they quickly become associated with a particular person, so their use in uniquely identifying that person within a given context become central to maintaining the person’s reputation in whatever they do. This is, for example, particularly important to academics in Higher Education. In modern times, this name resolution needs to be done globally wherever the Internet is the context, whereas previously it would have been possible to use fewer additional pieces of information in more restricted contexts (e.g. a village, a country etc), depending on the purpose. These different contexts still co-exist but it is now necessary to provide as many as possible, since one cannot control or predict why the information is being requested in each instance on a global system such as the Internet.

How does this affect Higher and Further Education?

Increasing numbers of professionals and the bodies that they work for and represent need to describe their resources on the Internet, whether those are in themselves electronic resources, whether they are descriptions of electronic or physical resources (metadata), or whether they are other representations of physical resources, perhaps in addition to themselves being electronic resources (e.g. photographs). This is a particularly pressing issue in Higher Education and, to an increasing extent, in Further Education. Academic outputs may include publications, educational resources, visual, audio and audiovisual resources and so on. Perhaps the best known is the issue of scholarly publications, partly through the rise of the Open Access movement to make such resources freely available.

There are already a range of identifiers for academics and related professional university staff. One of the problems is that these are created for specific purposes that only cover whichever subset of staff is relevant to those purposes. For example, HESA keeps records that contain a HESA number for academic staff, which means that at least those who have published academic outputs will have such a number. Another number called the HUSID number is maintained for students, since tracking academic careers from student to staff is one important concern for HESA. Many academics in relevant fields may have ISNI numbers, which are used widely in the media content industries. Many academics will have one or more professional staff pages, including within repositories and Current Research Information Systems (CRIS), each with a URI, not to mention OpenIDs and URIs associated with Web services which they use professionally and/or privately, e.g. LinkedIn, Academic.edu, Facebook, Twitter and so on.

Here are some examples belonging to Brian Kelly of UKOLN:

The problem is that the coverage of these numbers is not universal within the HE sector, and there is no single recognised authority or other agreement to prevent and resolve conflicts where information is not consistent between two or more information sources.

At present, the JISC are trying to solve this through the Unique Identifiers Task and Finish Group, which also includes representatives of HESA, HEFCE, the various Research Councils in the UK and UKOLN. The preferred solution is currently the ORCID academic identifier, which is being developed internationally with publishers, with a great deal of input from the United States in particular.

In order to succeed, any such identifier will need international penetration of the higher education sector, since academics will not use it unless it delivers the sorts of interoperability benefits that make their work easier and become integrated into the recognised systems required of them by funders and publishers in the course of their work. Since students and academics change roles and institutions, this needs to be recognised and outputs properly allocated to institutions and departments, which may themselves change identities, merge and de-merge over time.

While institutions will need to reduce the workload on academics by bulk loading information about staff, since the main incentive to use the system is that every academic has a record, there is also an issue about control. Should academics have the ability to alter their records at will? Are assertions automatically trusted or does a particular record for an academic’s time at an institution need to be verified by that trusted body? Who should maintain a list of trusted bodies who can back up assertions? How will this effort be funded sustainably? It becomes clear that some of these points are central structural concerns whereas others may cover only fringe issues such as avoiding deliberate falsification, which may be rare.

Proprietary academic identifiers

There are also a number of proprietary identifiers associated with different commercial services related to electronic publishing and related academic service industries. Thomson Reuters and Elsevier provide identities for individuals and organisations as part of their bibliographic and academic services; similarly, search services such as Google Scholar (see the study in this blog post) and Microsoft Academic Search have also started to offer identifiers (see this blog post). There may be privacy issues, for example in Google and Microsoft publicly surfacing information about researchers without explicit consent: while this information might have been suitable for the limited purpose of publication, academics may not have intended for it to be synthesised into a single, public description of their personal details available to all.

Some of these services introduce new problems, since their primary purpose is commercial and it is often less of a priority to deal with the internal issues facing academic institutions unless that impacts significantly on the ability to make commercial profit. These may be resolved over time or be reintroduced as services change and compete: the academic has little or no control over the effects of commercial decisions upon their work. For example, Microsoft Academic Search often misrepresents outputs as belonging to similarly named individuals (thus is currently failing at unique identification) and, by default, requires the manual input of researchers to edit out errors and take a proactive approach towards managing the information about themselves. This brings the overall quality of data into question: for large-scale statistical purposes, this could be tolerable, depending on the degree of error; however, for academic citations and reporting purposes such as the Research Excellence Framework (REF), it would not be acceptable to use this data without further refinement, which would most likely remain a long, manual process.

Software and services

Any software application layer, whether operated by commercial companies, higher educational institutions, funders or governmental bodies, needs to be maintained. If information is harvested or processed automatically, it needs to be clear who corrects information where errors are found and what the resources are for academics to contact individuals with the time and effort available to improve the data as part of their work. In the case of commercial organisations, this is usually unclear and may change. There is no guarantee that the commercial reason for providing services will continue over time, unlike in most cases in the public sector within Higher Education. Coverage of such commercial services is often geared towards institutions rather than individuals: for example, Google Scholar requires registration using a valid university email address that it recognises, which would exclude private scholars and perhaps some retired staff who produce research.

The Web of Things

It has already been mentioned that electronic descriptions or other representations of physical objects may be found on the internet, including written descriptions, pictures, geographical locations, dimensions and so on. It is even possible to describe physical objects that were extant but are now historical, or which have moved or whose location is now unknown, referencing comparable objects and linking these descriptions with other resources that are related. In each case, the nature of the relationship, relevant agents who may have been responsible for it, and when it was valid can be described in metadata.

This opens the way for the Web of Things, a term used to describe that part of the Semantic Web that covers physical resources as opposed to, or as well as, purely electronic ones. Some authorities use the term to mean physical objects with miniaturised electronic devices to enable them to be located, whereas others merely mean any physical object that is described in a record on the Web. It may be argued that all electronic resources have relationships to physical ones, even if that is only with regard to authorship and subject. The Resource Description Framework (RDF) provides a means to describe these relationships and transmit information about them in ways readable to humans and machines. Although these are usually expressed as triples, where two things are described with a relationship between them, metadata structures such as the Common European Research Information Framework (CERIF) can add link tables that give far more detailed information about the relationships themselves. All of this can be made available as Linked Data and surfaced in many software applications on the Web.

The Semantic Web is often seen as a utopian view of a future where no electronic resources will be published without complex information being provided or automatically generated about its origins. The reality is that manual entry of information is generally very limited unless it serves the purposes of the person entering it, and this cannot be relied upon as an approach to ensuring large-scale, consistent metadata on a sufficient scale for the Semantic Web to work. Technology has in some cases improved to the extent that geographical and technical information is now automatically produced, for example in digital cameras and in mobile phones able to record GPS coordinates.

However, the effort and cost required to catalogue the entire physical world and the extent to which this is even possible is highly doubtful. Where the Semantic Web could be useful is within particular large bodies of data, for example experimental scientific data, publications and so on. In the case of the Web of Things, this could include art collections, photography, archaelogical information, the locations of public institutions and many more. For all of these purposes, it will be necessary to provide unique identifiers for increasingly large numbers of resources, including things and agents, in order to provide complex metadata about them.

Education in the wider world

It has perhaps not been sufficiently investigated how unique identifiers for researchers and other staff in Higher Education will fit into the wider question of unique identification on the Web. Relevant purposes might be:

(1) commercial, for example the identification of companies and individuals owning the rights to photos, music, video or publications, particularly legacy resources of ongoing commercial value in terms of royalties and performance licencing.

(2) governmental, for example biometric information about people, used in border controls, crime prevention and citizenship contexts; or about public or private organisations such as charities, political groups of interest to law enforcement etc. Information about individuals, in particular, may be subject to privacy laws, which will vary between jurisdications.

It is clear that there are interfaces between the various agents and outputs of academic institutions and many other purposes, notably those commercial and governmental activities already described. For example, a foreign student or member of staff seeking a work permit will require institutions and governmental bodies to use personal and citizenship information co-operatively, which will be linked to their academic identity in the course of their work at the institution. Some of this information will be private and some public, so there is an issue about who can see which parts of a particular corpus of Linked Data, requiring authentication protocols and systems.

The extent to which consistency of approach between HE institutions and other sectors and contexts can ever be ensured is moot, since there is of course no single international authority and because any single metadata solution that tried to cover so many diverse purposes would be fatally unwieldy. How different, flexible approaches can be understood by machine processing is perhaps the technological key to how well the Semantic Web will answer these questions in future, both within Higher Education and beyond.

Diaspora – a distributed social network

6 Replies

Just over a year ago, I wrote a blog post about Diaspora, the new distributed social platform that is now in alpha development. What is it? What does this mean? Why would anybody want this rather than Facebook? Is it doomed to failure?

The reason that four students of computer science from New York set about writing this open source software, for which they have secured significant funding, was principally about privacy and licensing concerns. Essentially, large companies like Facebook, Twitter and so on can (and frequently do) change the service and terms of use at will, including the rights that they have over your data or how they will expose it to the public. Since there is little alternative but to sign up, or else not use a service that connects everybody else, there is effectively an unbreakable and unhealthy monopoly over personal data in social networking.

Somebody will always have control over your data unless you run the server yourself or else set up or join a group whose collective aim is to preserve the privacy of its users in so far as it is able. Even so, you must shop around and the available choice is likely to be limited: it’s not an ideal world when it comes to data privacy. What Diaspora does is to give you the ability either to run your own server or join one that you do trust, thus seeking to improve the available choice (in the “Cloud”). There are several levels of trust to consider here, but the main broad options perhaps seem to be be:

(1) Large providers like Facebook or Twitter have the infrastructure and resources to provide security – though the recent case of Sony being hacked by the Lulz group proves that everybody, big or small can potentially be hacked. They are less likely to disappear any time soon than small services. However, it is usually difficult or impossible to extract your data and, even if you can, get it in a form that is re-usable elsewhere. Privacy is usually minimal and terms are frequently altered as the company tries to manoeuvre itself into a better marketing position: this is ultimately the reason that it holds your data, so that it can mine it on a large scale in order to be able to target advertising at you and others.

(2) Small providers, especially those with an open source ethos, tend to have a far stronger interest in data privacy – at least, while they remain small! In the case of small companies, this is their main sales pitch and advantage over larger companies, whose commercial interests have led them away from such concerns. The downside is that small providers have a consequently more limited income stream in terms of advertising, and have fewer resources to take advantage of your data in order to maximise their income. That means they are less likely to survive. In the case of providers linked closely with the open source movement, privacy is likely to be best protected but the business model that underlies the survival of the service is likely to be weakest. You will almost certainly be best positioned to extract and re-use your data elsewhere in a similar service (this is no automatic guarantee if standards change, but Diaspora is the first and the likely standard from now on), but that may mean frequently changing your identifier on the Web. Unique identification makes it easiest for people to find you, although on today’s Web it should not be too hard, if you want to be found. It’s a bit like changing your land line or mobile (cell phone) telephone number: so long as you tell everybody when it changes and they still want to keep in touch, is this a critical problem or not?

Facebook and Twitter have the advantage of being first into the marketplace, which means they have got a large number of users. That means, if you want to find friends, that is where they will be. But remember how dominant MySpace was? Well, today you won’t find many adults there any more, as it’s mostly declined to a point where the only communities that it has retained are teenage (but it by no means dominates that market) and bands. That said, Facebook and Twitter could decline too, if a better competitor arrived that provided what people want. That does not mean that they will. For instance, has identi.ca (as it sounds, identical to Twitter but open source and distributed over many independent servers) replaced Twitter? No, because the service adds little that is substantial over Twitter, and the latter has not yet alienated its users to the point where they wish to leave in numbers. Twitter is not about privacy or control. By using it, you expose what you say to the world anyway.

It must be said that Diaspora is still in alpha development, so we ought to forgive it to some extent for looking a bit ropey still. Compared to Facebook, it is currently third-rate, although it’s a major advance in only a few short months that it is working at all. Of course, it has no community because it has not been publicly released and belongs only to the hard core of developers as yet.

This means that we still have to speculate a little on how successful it could be, but now we have a bit more to base this on compared to a year ago when the idea was launched. To start with, the fact that the software exists and is developing fast is reassuring: it shows that these developers can raise funds and they can produce working software quickly: they know what they are doing. That allows us to predict that the currently limited functionality will improve dramatically.

Diaspora is more like Facebook than Twitter, though I think it will remain a little less cluttered than Facebook. This is by no means necessarily a bad thing: today a majority of people use Google over Bing or Yahoo for the lack of clutter as an aesthetic preference: this was an improbable and most likely unintended master stroke of design, on the part of Google, that will probably never be repeated in quite the same way.

Still, we must focus on comparative functionality rather than purely on design. That is to say, is there anything that fatally undermines Facebook that Diaspora does provide? If so, will a critical mass of people ever discover this and migrate in numbers? This is an enormous challenge. Statistically, the answer has been no: many better services fail to survive or remain niche because of the lack of successful mass marketing and market penetration, whereas worse ones survive and thrive purely because they have the resources and ability to market successfully. But let us remember that Facebook succeeded through viral marketing, i.e. cheap or close to free channels, relying on word of mouth. Its traditional marketing has only blossomed since then, and is not the reason for its success. In fact, its traditional marketing is one of the things that annoys its users the most! So Diaspora has some hope. This is not the early days of the Web. Honestly, it is difficult to predict where things will go next, as they are often very surprising.

What do identi.ca and Diaspora have in common? The first one has a name that sound inherently unoriginal (because of the semantic meaning of “identical”) and, in ordinary language, a rip-off. That’s bad marketing. Diaspora is a terrible name. It has the connotation of a community that has drifted apart to the point of irrelevance and weakness, like many historical diasporas (e.g. the Jews, the Roma/Gypsies etc). The name Facebook contains “face” (something we all care about) and “book” (something that contains information). The latter is perhaps a strange choice for the Web, but the combination is successful. Whilst not an obvious master stroke of psychology and semantics, which is to say that its meaning is not immediately clear to an English speaker who had not heard it before, it certainly isn’t a disaster. It has worked and it has taken off.

But Diaspora is not marketing itself as a single service. Each “pod” will be a node on a network, an independent service on a separately owned server marketing its own particular implementation to the public. It will be that name that you see. Hmm, you might say, how could that succeed? Well, that is how email works. And email is the most successful internet communication system AND unique identifier system so far devised. To register for a web site you need an email address to identify yourself. It is widely acknowledged that Facebook private messages are inferior to email and modelled on it. Email is the yardstick for private messaging, and will no doubt remain so. Google Wave, of course, was the only service ever to seek to displace it, and it singularly failed.

What does Diaspora have that Facebook does not?It has the ability to group different sets of people into “aspects” such as “Friends”, “Family”, “Work” and whatever else one may define: and the same people can, as in human society more widely, appear in multiple aspects of one’s life. Only the user sees the aspects themselves: the friends see just what is shared with them, again as in life more generally. One can broadcast to all aspects, which is like Facebook’s implementation: this is the default. It’s easy to use and intuitive, as well as having something obvious that Facebook lacks. Also, like Twitter, you don’t have to have two-way friendships. Of course, you will know if you follow a person who does not follow you, which reveals that they aren’t interested in you.

For the full functionality to work properly, you need to be mutual friends, as in Facebook. If you want a person to think you are mutual friends, add them to an aspect like “Boring people”. At present, you can’t hide this aspect from the “All Aspects” (main) page, a critical function that has been suggested, but you could bookmark each aspect you’re interested in separately and only look at those, as a work-around for now. Though they will never see the name of the aspect, you can decide if people see other people in the same aspect as themselves, which enables them to make friends with sets of friends over which you have control but not all of your friends. This limits (but can never entirely prevent) so-called “gaming” of the friend system purely to increase the total number of friends for perceived prestige or for spamming purposes.

What doesn’t it have? Well, so far it has no groups, which it will have in future. That absolutely needs to happen. It has no games, which I doubt it will ever have. So those that use Facebook to play games will probably stay there. There is no advantage in duplicating all that, to which privacy and control issues don’t really apply. None of that part of a user’s information is personal, although it can link to personal information. Things like Causes on Facebook are very popular because they allow people to express opinion and membership related to things that they care about, whether they are to do with politics or about human or animal welfare. But Causes is really an extension of the idea of groups. If groups works properly, it should be unnecessary.

Diaspora also needs to develop events, and I hope that this will also be added. It will have proper email for those providers that wish to offer it, not just the terrible imitation provided by Facebook, and on diasp.org you can already forward email to your normal email address. These are both necessary features, and will no doubt be implemented.

The system of making friends in Diaspora is not good at present. On the other hand, let’s be fair in our comparison: the system of finding friends in Facebook has changed over time, and it is currently a great deal worse than it used to be. The suggestions system is useless, and all previous variants have been useless too. You can’t combine a text search for a person’s name with a search by mutual friends, network or place in Facebook: it’s one or the other. Sometimes it doesn’t find people or events that are there and have made themselves publicly visible.

But Diaspora has a problem in that the distributed nature of the network means that you can only search within the pod unless the pod has previously been told about the existence of the other pod where you want to look, i.e. you (or perhaps somebody else) has previously searched for a contact there. You have to know the Diaspora handle or identifier, which looks like an email address, e.g. me@diasp.org etc, because unlike on Facebook you can’t automatically expect that all contacts will be on the same service. But this is no harder than sharing a phone number or email address.

At present, even when this is working properly, it means that the first search fails and you have to do it twice before it succeeds. This flaw needs to be overcome, as most people will assume that the person doesn’t exist and won’t look again. In practice, it barely works: I have tried diasp.eu, diasp.org, my-seed-com [Ed.: no longer exists, 2013-09-03], and londondiaspora.org [Ed.: Diaspora no longer on server, 2013-09,03], of which only the latter two managed to make mutual friends with each other, after several attempts. That said, once it worked (and remember, it’s only in alpha development), the posts appeared within a few seconds, which is likely to be as good as Facebook despite being on separate, independent servers.

Diaspora can already integrate with both Twitter and Facebook: the former is useful because many people will remain with Twitter; the latter is useful for the purposes of transition and viral marketing. As far as I have seen, posts from Facebook and Twitter don’t seem to appear on Diaspora, however. Finally, you can post publicly, like on Twitter, identi.ca, Plurk and so forth. This is all fairly intuitive, and it defaults to the Facebook model, i.e. all of your contacts unless you select public, limit it to only certain aspects, or choose Twitter or Facebook.

Do I think that (a) Facebook will collapse through a great privacy scandal at just the right moment for Diaspora to step into the breach, once it has reached public release? (b) that people will come to think “MyNamedSocialProviderNetwork.com” = “social networking” and not particularly worry that the underlying software is called Diaspora? (Compare Drupal or Joomla etc underlying many modern web sites.) This is surprisingly possible, but depends on so many complex factors that I don’t feel able to gaze into my crystal ball at this precise moment! Still, what I will say is that Facebook could die, as MySpace has partially done before it, and perhaps even that one day it will die when fashions on the Web change. Its legacy will remain – and perhaps that will be in a more generic type of service. If so, Diaspora could be the technology that underlies it. It is not in itself a single Web service, but the first independent platform enabling them to be built.

Diaspora has an uphill struggle to defeat Facebook, for which it is not yet ready. But in a few months, when I predict it will be ready, it will be interesting to see how this unfolds. As an optimist who favours the underdog and the open, more democratic solution, I’d be disappointed if it utterly failed. This is not Google Wave. Social networks have been tried as a technology, whereas Wave servers had not. Diaspora is not, unlike Wave, trying to put all its technologies in one place, e.g. IM, email, collaborative documents (similar to Google Docs) etc. It is simply trying to provide a simple, effective and privacy-aware social networking system based on the best elements of existing ones. We wait to see whether the distributed model will be transferable from a technology like email to one like social networking. This is the real innovation. Is this just an ideal, and is the world yet ready for it?

Update 2011-06-29 09:25 UTC+1 (BST): As proof that the concept of “aspects” is valid, a straight copy has been made by Google Plus in their “circles”, though of course theirs will be a closed social network exploiting your personal data for their gain. It’s impossible to believe that they were unaware of Diaspora. This is an extremely underhand way to behave, to copy an idea without crediting it. The mainstream reports have so far ignored Diaspora (see my further comments below).

A-salaam-o-alaikum

1 Reply

Islam has often been demonised in the West as a violent religion. Notably, terrorists from the Muslim community have been labelled “Islamists”, an invented word that implies that these individuals are acting in the interests of Islam. This is clearly not the case. The vast majority of Muslim people support the cause of peace, not a damaging war with the West that will ultimately kill far more Muslims than Westerners. It is not an us-and-them world. The interests of Islam and its people are not terror and war, but peace.

This is embodied most obviously in the Islamic greeting A-salaam-o-alaikum (even non-Arabic speaking Muslims greet each other and pray in classical Arabic, even though most cannot speak Arabic otherwise). It means simply “peace be upon you”, to which one may answer the same in return or else Wa-alaikum-salaam “And peace be upon you”. This could of course be said even by a Christian Arab, or indeed by anyone of any faith or creed.

The word salaam “peace” is the direct equivalent of the Hebrew word shalom, and it can also be used by Muslims at greeting or parting, in exactly the same way as is done by Jews across the world. I recall my father pointing this out to our Jewish neighbours’ daughters: this was meant as a sincere gesture of friendship, although I am not sure whether or not his intention was clearly understood: I hope so. Arabic and Hebrew are closely related Semitic languages. In Pakistan and amongst Indian Muslims, people often bid goodbye to each other with the Persian phrase Khuda hafiz, which means “God protect you”. It is evident that Islam is essentially no more violent a religion than Christianity or Judaism.

In India, and previously in what is now Pakistan before the Partition, Muslims, Hindus and Sikhs greet each other (and greeted, in Pakistan) with the secular Hindustani (now Urdu or Hindi) phrase adaab arz hay “respect is expressed” or more simply and more informally adaab “respect”. (There is only one little-known border area of Pakistan where Hindus still remain, on the edge of the Thar desert.) Muslims accompany this with a particular gesture of respect, raising the open hand with the fingers to the right temple and moving it down and towards the addressee (never showing the palm side, which would be disrespectful). Respect and its expression is an extremely important socio-cultural activity in India and Pakistan.

Demonising individuals such as Osama Bin Laden has backfired monumentally upon the West. In a few years, the western media has raised an obscure terrorist to the status of international master terrorist, and in so doing they have done his work for him far better than he could have done himself from the very fringes of the world. Now that he is dead, they have created an icon for the violent, misled minority of Islam’s youth, striving against the damaging effects of western geo-political and cultural imperialism on their lives and communities. Yet these are inherently peaceful, tolerant societies of ordinary, decent people.

In the interests of peace between the West and the Islamic world, we ought to be focussing instead on the Arab Awakening (the term “Arab Spring” is not appreciated in the Arab world, and is not for example used on Al-Jazeera TV), not on creating our own enemies in groups such as Al-Qaeda. The Arab Awakening is a vernacular, non-Western movement for freedom and democracy, rather than the traditional imposition of Western-style democracy from the outside. The West, if it were to act wisely, would do best to support the Arab Spring rather than continuing to prop up the oil magnates who were arbitrarily created as royalty by the British from amongst their tribal allies during the days of empire.

The greatest act towards peace would be the one that the West should have made decades ago, and which would have saved tens of thousands of lives: they should have ended the war in Palestine by supporting the Two State Solution. If Palestine and Gaza are genuinely free and the war with Israel is ended on reasonable terms, one of the greatest ever motivations for war between the West and the Islamic world will be removed at a stroke. Together with long-term support for the Arab Spring and withdrawal from imperialist activities in Iraq and Afghanistan, it is possible to establish a long-term and lasting peace. But all countries must look past their immediate political interests and realise where their real interests lie. Otherwise many more people will die.

A-salaam-o-alaikum.

Troménie

Leave a reply

What is the origin (etymology) of this word?

tro minic’hi “tour de (lieu et terre d’)asile monacale”
i.e. “circumambulation of monastic refuge or monastery”

The word minic’hi is derived from manac’h “monk” (pl. menec’h). A phenomenon known as secondary i-affection has caused vowel raising in an earlier (already i-affected) form like *menehi, which accounts for the discrepancy between the forms.

The best known event described as tro minic’hi (troménie) occurs once every six years and involves walking about twelve miles. It occurs at Menez-Lokorn, which is a hill at Lokorn (Locronan) in Bro-Gerne(v) (Cornouailles), in the département of Finistère, not far from Douarnenez. However, the word is not etymologically derived from the word menez “mountain, hill”. It occurs at one of a type of Catholic festival called a pardon, in this case dedicated to St Ronan (not all pardonioù include such processions), but was apparently preceded by a pagan event involving a similar ritual. There are or were a few other similar events, some of which are now defunct.

The reason for me publishing this no doubt widely known piece of information is that I remember being asked it by my Breton class, at which time I was hazy on the details. I’m making amends by clarifying the etymology here! 🙂

What is ePub?

Leave a reply

This post was originally published on the Application Profiles Support blog at UKOLN.

ePub is a standard packaging format designed for ebook readers. Here is the definition given in the entry in Wikipedia:

[…] ePub […] is a free and open e-book standard by the International Digital Publishing Forum (IDPF). Files have the extension .epub.

EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.

That is to say that ePub contains within it the Open Packaging Format (for convenience, we can ignore the other structural parts for the purposes of this discussion), which defines the structure of both the metadata for the item contained within the file and the presentational (XML, XHMTL, CSS) elements of the standard. It is similar in many ways to a .docx file (MS Word 2007 onwards) in being effectively a specialised type of .zip file.

So why is ePub of interest from the point of view of metadata and application profiles? The IDPF’s Open Packaging Format gives this description:

Dublin Core metadata is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (http://dublincore.org/documents/2004/12/20/dces/), supplemented with a small set of additional attributes addressing areas where more specific information is useful. For example, the OPF role attribute added to the Dublin Core creator and contributor elements allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.

Content providers must include a minimum set of metadata elements, defined in Section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.

In which case, how is the metadata contained within ePub any different to Dublin Core 1.1? This is the interesting part:

Because the Dublin Core metadata fields for creator and contributor do not distinguish roles of specific contributors (such as author, editor, and illustrator), this specification adds an optional role attribute for this purpose. See Section 2.2.6 for a discussion of role.

To facilitate machine processing of Dublin Core creator and contributor fields, this specification adds the optional file-as attribute for those elements. This attribute is used to specify a normalized form of the contents. See Section 2.2.2 for a discussion of file-as.

This specification also adds a scheme attribute to the Dublin Core identifier element to provide a structural mechanism to separate an identifier value from the system or authority that generated or defined that identifier value. See Section 2.2.10 for a discussion of scheme.

This specification also adds an event attribute to the Dublin Core date element to enable content providers to distinguish various publication specific dates (for example, creation, publication, modification). See Section 2.2.7 for a discussion of event.

Using these addition attributes, it is possible to define more accurately what certain fields contain, a standard, normalised format for agent metadata such as personal names, schemes defining the format in which a particular field is expected to appear, identifiers to provide a mechanism to link that metadata to the generating system or authority, and events to describe more accurately the events that have occurred during the life cycle of the item. By applying such constraints that are beyond the scope of DC 1.1, the ePub format effectively contains a de facto application profile, identified by its own namespace. Further, ad hoc metadata can be added using the (X)HTML meta element:

One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element […]. This allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification. Individual OPS Content Documents may include the meta element directly (as in XHTML 1.1) for document-specific metadata. This specification uses the OPF Package Document alone as the basis for expressing publication-level Dublin Core metadata.

It would seem, however, that this last option suffers from the weakness that such metadata is invented on the fly, and does not have to follow the constraints of any schema or authority.

Nonetheless, it would seem overall that the ePub “application profile” does significantly add to the functionality DC 1.1 in a potentially useful way. Different types of agent defined in DC, such as creator, contributor, can be further defined, for example author, editor, illustrator and thesis supervisor for higher degrees. Potentially, this could be leveraged for use with a number of different types of resources and for various purposes, although ePub by it’s very nature is designed for reflowable content, which by and large means textual resources such as books, articles, manuals and so on. Illustrations, tables, charts, images and other non-reflowable content can potentially create a problem on the small screens of mobile devices such as ebook readers.

The structure of this application profile is very simple and easy to use, unlike for example the classic form of SWAP, whose structure is based directly upon its conceptual data model, a simplified version of FRBR. It would be extremely interesting to compare the two, since they are fundamentally similar, relatively simple solutions that are limited in scope to online publications and similar resources. It would be most revealing to see whether what SWAP seeks to achieve can be done in a simpler way, and whether either SWAP or the ePub application profile have functionality that the other cannot provide.

Ultimately, the purpose of this investigation could be to provide online textual content, for example in repositories, via increasingly popular hand-held devices, and to capitalise on the rapid growth of commercial ebooks. It would probably be necessary to provide .epub files in such systems as well as the usual .pdf and .doc(x) formats that are common in publishing, and consequently in institutional repositories. Either this would need to be done by converting the existing content, and likewise new content after it is deposited, or in addition by providing tools to enable the ePub format to be more immediately accessible to service providers and depositors in future.

[Ed.: details of event in December 2010 not copied from original post]

pisc

9 Replies

Gollum’s riddle from The Hobbit

pisc gant j r r tolkien in brittonec pi in combraec

di anatl is biu
oir marb it iu
bitt hep siccet id ib
armguisc tav ai tib

For those who don’t know Brythonic or Old Welsh, this is my rather loose verse translation of a riddle posed by Gollum to Bilbo Baggins in the chapter “Riddles in the Dark”, The Hobbit, by J.R.R. Tolkien, 1937. The answer to the riddle is “Fish”. The archaic orthographical convention of writing voiced and spirantised sounds with their unvoiced and unspirantised equivalents is kept to, meaning that one initial consonant mutation is also unwritten; and majuscule (upper case) letters would not have been mixed with minuscule (lower case) as now. The letters u and v are interchangeable. No words happen to occur that would differentiate Brythonic from Old Welsh.

I think this is rather reminiscent of Old Irish poetry, which is pleasing.

It would work almost as well in modern literary Welsh with remarkably few alterations other than the orthography (just two verbal particles – or more accurately, from the perspective of the older languages, one verbal particle and the loss of an infixed relative pronoun). The use of the copula ys is now effectively defunct in modern Welsh, however, and only functions as a vanishingly rare literary impersonal. The preposition di must now be taken as an adjectival prefix.

Di-anadl ys byw,
Oer marw ydyw;
Byth heb syched yr yf;
Arfwisg taw a dyf.

Practical metadata solutions using application profiles

Leave a reply

This post was originally published on the Application Profiles Support blog at UKOLN.

Past and present

Up until the present, a number of application profiles have been developed by various metadata experts, with the support of the JISC, with the intention of addressing the needs of practitioners and service providers (and thus ultimately their users) across the higher education sector in the UK. The most significant of these have been aimed at particular resource types that have an impact across the sector.

Their names indicate the approach that has been taken to date, e.g.:

SWAP – Scholarly Works Application Profile
IAP – Images Application Profile
GAP – Geospatial Application Profile
LMAP – Learning Materials Application Profile (scoping study only: also the DC Education AP)
SDAPSS – Scientific Data Application Profile Scoping Study
TBMAP – Time-Based Media Application Profile

Problems with this approach

However, it cannot be said that a particular type of resource type, set of resource types, or even general subject domain actually constitutes a real, identified problem space that faces large sections of the information community in the UK higher education sector today. Geospatial resources can be any type of resources that have location metadata attached (e.g. place of creation, location as the subject of the resource). Learning materials can be any type of resource that has been created or re-purposed for educational uses, which can include presentations, academic papers, purpose-made educational resources of many types, images, or indeed almost anything else that could be used in an educational context, to which metadata describing that particular use or re-use has been attached. Images might have all sorts of different types of metadata: for instance, metadata about images of herbs might need very different metadata to images of architecture. The same applies to time-based media: what is the purpose of these recordings and what are they used for? why and how will people search for them? Likewise, the type of science in question, of which there are almost innumerable categories and sub-categories, will to a large extent determine the specific metadata that will be useful for describing scientific data.

Of all of the above, only scholarly works, which might more usefully be called scholarly publications, are an entirely focussed, specific set of resource types with a common purpose. The others are loose and sometimes ill-defined collections of resources or resource types that fit into a particular conceptual category. Only in the case of scholarly publications is there an unspoken problem space: discovery and re-use in repositories and similar systems, usually but not exclusively as Open Access resources. There are other related problem spaces such as keeping accurate information about funders and projects for the purposes of auditing that is required by funding bodies and university authorities. The ability to access these resources with new technologies could be a further area of study, and is one that UKOLN is taking an active interest in. Again, the question must be “what do users want to do with these resources?”

Current Approaches

It must not be said that the work in creating the application profiles mentioned above has been wasted. At the same time, the above application profiles constitute general purpose solutions that do not target specific problems affecting identifiable communities of practice across the sector. There is considerable work continuing in Dublin Core Metadata Initiative (DCMI) circles on how metadata modelling should best be carried out, for instance on the Dublin Core Abstract Model (DCAM) and on the overlap between application profiles and linked data, where those application profiles contain relationships that can better enable resource discovery in a linked data world.

New Approaches

These approaches remain useful. However, more immediate, specific problem spaces face particular university services (not all of which are necessarily repositories) in trying to describe resources so that they can be discovered, providing copyright and other licensing information so that they can be re-used, providing funding information so that work can be audited and cases can be constructed for funding new projects, and so on. Some of these resources may be textual, but others are increasingly including images (of many types and for many purposes), music, film, audio recordings, learning objects of many types, and large scale corpora of data. Any metadata solution that is tailored to a particular purpose (and, thus, which is usually de facto an application profile) needs to address specific aspects of the Web services that practitioners and other service providers are seeking to develop for their users, not simply provide general catch-all metadata of relatively generic use.

Key to all this is consultation with those communities: first, to scope the most significant two or three problem spaces that face the largest number of resource providers in serving their users; second, to get those practitioners together with developers to draw up practical, workable recommendations and perhaps demonstrations; third, to provide tangible evidence to the developers of existing software platforms, and to engage with them to help solve such problems in practice. To do this, it is necessary to engage practitioners and deverlopers in practical, hands-on activities that can bring the discussion forward and provide tangible solutions.

Ar vrezhonegerien ziwezhañ?

1 Reply

Ganet e oan-me e 1977, hag er bloavezh-se eo, dre zegouezh, e voe diazezet ar skolioù Diwan. N’eus nemet tro-dro 1.45% eus ar skolidi e Breizh a zesk dre ar yezh. Da lavaret eo skolioù Diwan, Dihun ha Div Yezh lakaet a-gevret. Anat eo, n’eo ket trawalc’h evit dazont ur yezh vev.

Dregantad ar vrezhonegerien e broioù Breizh e 2004

Pa oan e Bro-Wened nevez zo, ne’m eus klevet ger brezhoneg nemet etrezomp-ni, ma c’heneilez ha me, hag ur wezh diouzh perc’henn ur stal levrioù brezhoneg e Gwened. Pep gwezh on bet e Breizh eo c’hoarvezet ar memes tra. Ar yezh zo aet kuzh penn-da-benn en he bro he-hunan, ha souezhet e vez ar Vreizhiz d’he c’hlevet. N’eo ket tout an dud yaouankoc’h a anavez ar yezh, zoken. Ret eo anaout brezhonegerien dija evit komz ar yezh, ha gouzout pelec’h e tleer mont d’en em gavout gante.

Pa’z eer betek ar fest-noz en deizioù-mañ, gouzout a ra mui-oc’h-mui eus an dud penaos dañsal en doare breizhat, c’hoari ar binioù hag arall, hag gouzout a reont kalz eus ar sonerezh breizhat ivez. Lod a c’hall kanañ brezhoneg, zoken ma n’ouzont ket lavaret tra arall ebet. Spontus eo, forzh penaos, ken ral e teu ar vrezhonegerien vat da vout.

Hervez al lec’hienn oui au breton, ne chome nemet 220,000 brezhoneger e 2006, ha kollet e vez 10,000 bep bloaz: neuze e c’hallfe bout ken izel evel 180,000 hiziv. Mar bez gwir an niverioù-mañ, ha mar talc’ho da vont kentizh-se evit ar mare-mañ, e vo kollet ar yezh er c’hantved-mañ. Abretoc’h c’hoazh e vefe kollet ar rannyezhoù, dreist-holl an hini muiañ disheñval, ar gwenedeg, hag a zo gwanañ. N’hall ket derc’hel da goll gwad ken buan-se, anat deoc’h, rak e 2028 e vo yaouank a-walc’h skolidi Diwan an deiz hiziv.

Bag kozh e-tal ar Mor Bihan

Ma mignoned vrezhonek ha me ne vimp ket ar re ziwezhañ da gomz ar yezh. Skolidi an deiz hiziv eo hag a gomz ar gerioù diwezhañ evel yezh vev, ma ne dro ket al lanv a-enep ar brezhoneg diouzhtu. Ma ne c’hoarvezo ket buan e vo re ziwezhat – daoust hag ez eo dija?

Entry for the Developer Challenge OR2010

Leave a reply

This post was originally published on the Application Profiles Support blog at UKOLN.

Talat Chaudhri and Stephanie Taylor submitted an entry to the Developer Challenge at Open Repositories 2010 in Madrid, which was received with some interest because it used Open Calais to automatically create links to related content. This “quick and dirty” entry was made at the last minute, so only the main features worked. It comes out of UKOLN’s work in creating a new, interactive Drupal site (soon to be launched) as a focus for their work on various metadata activities, including but not limited to application profiles, aimed at providing a hub of user-facing, community documentation. The demand for such a central focus of metadata information was raised separately at the first meeting of the Metadata Forum.

The Indo-European

Languages, the Web and other stories

Microservices in (and beyond) Research Information Management

This post was originally published on the Technical Foundations web site at UKOLN.

Microservices: are they all that new?

Defining terms: “repository”, CRIS, RIM etc

Software development approaches

“Does it do more than we already do?”

Similar software issues facing HEIs

What else can these systems do?

Solutions that fit problems

The business of unique identification

This post was originally published on the Technical Foundations web site at UKOLN.

What need is there for unique identifiers?

Why aren’t names good enough?

1. People

2. Organisations

Isn’t this bigger than the question of unique identification?

What is an identifier?

What are unique identifiers?

Aren’t identifiers really just names?

How does this affect Higher and Further Education?

Proprietary academic identifiers

Software and services

The Web of Things

Education in the wider world

A-salaam-o-alaikum

Troménie

What is ePub?

This post was originally published on the Application Profiles Support blog at UKOLN.

pisc

Practical metadata solutions using application profiles

This post was originally published on the Application Profiles Support blog at UKOLN.

Past and present

Problems with this approach

Current Approaches

New Approaches

Ar vrezhonegerien ziwezhañ?

Entry for the Developer Challenge OR2010

This post was originally published on the Application Profiles Support blog at UKOLN.