CREATIVE COMMONS TECH SUMMIT
I wrote this summary of the first Creative Commons tech summit in 2008 at the Googleplex, however somehow it ended up in being a draft in WordPress for more than a year. I decided to add links and publish it although in the meantime two follow up conferences have already been held. Maybe it still holds some value. Enjoy my rather lengthy ‘summary’.
Thanks to Creative Commons.org and The institute of Sound and Images I was able to join the first Creative Commons Tech Summit in San Francisco. Since the amount of participants was limited to 100 people I decided to write a summary for those not able to attend on this blog. There have been more people writing about the summit and the whole thing is made available on Youtube as well. I hope this summary will be of use to you. Feel free to comment. More after the jump..
Joi Ito – CEO of Creative Commons – welcomes the participants of the first Creative Commons Tech Summit with a short keynote. In this he reflects on the start of the Internet and the first fights on keeping the Internet and open place for anyone regardless of politics. He does this by referring to the old discussions surrounding the use of the (open standard) TCP/IP protocol instead of some closed proprietary standard. He uses the example of interoperability as a starting point to explain his view on Creative Commons which consists of two points:
- Creative Commons is open for anybody, even those Creative Commons might disagree with. Anyone should be able to use the licenses or implement the technology developed by CC.
- Creative Commons does have a political agenda, but in a very pragmatic manner. Creative Commons strives towards openness and sharing. CC hopes that the use of the licenses and the technology will nudge those not so open or willing to share yet, towards a more open and sharing attitude.
After this short keynote Nathan Yergler – CTO of Creative Commons – introduces Ben Adida – Creative Commons’ W3C representative –. Adida has for quite some time been working on the Creative Commons Rights Expression Language (ccREL) as a replacement of the machine-readable metadata used in the past which worked, but was a bit of a dirty solution. Adida starts of with an explanation of the different CC license layers. Every CC license consists of three layers:
- A so called human readable version of the license which explains the license and its conditions in layman’s terms.
- A so called lawyer readable version of the license, which explains the license and its conditions down to the nitty-gritty details in lawyer’s jargon
- A machine readable version, which consists of a piece of text/code easily parseable by machines.
Why is there a machine-readable version asks Adida rhetorically. Because we want to be able to let software (‘machines’) be able to assist and advise us in the use of CC licenses. There are numerous ways in which this can be done. For instance by using the machine-readable licenses and some software we can ask the software to return us only those works that allow commercial use from a specific author. The old techniques used would allow this as well, but it has several flaws which Adida points out in his presentation. In general the old methods allows to easily make mistakes. It also meant that you had to repeat certain information in order to make it both useful for people as well as machines.
By combining content and meta information in a visual way, human-readable AND machine readable way, we only have to write it once, thus making it easier and less error prone to do. For example a title written for people, should also be used for the ‘machines’ to gather the title information of a work. CcREL aims to do this. It describes both information about the work and the license used using HTML in combination with RDFa. There is a common base set of fields to be filled in, such as attribution name, title, license etc. The base set is, according to Adida, designed to be extended with your own particular fields. Now by using ccREL it allows us to work towards the semantic web (data web) in which services, information and content can be interpreted and used by machines in numerous ways. Adida briefly mentions that there are also other ways (Adobe’s eXtensible Metadata Platform aka XMP) to use ccREL in case you cannot or don’t want to use HTML. More information about ccREL can be found in the paper written by Adida et all (PDF).
Following Adida is a panel in which Asheesh Laroia, John Willbanks and Nathan Yergler talk more about the CC specific technology initiatives.
Nathan Yergler talks about ccREL and how it has been put into practice already within the CC license chooser and the CC+ possibilities. It has now been made more clear in the license chooser that adding extra (although still optional) information it becomes easier for people to give attribution in the right way or to get more info about a specific work or author. It also allows you to add information where a user can obtain information about clearing rights that are not pre-cleared by your CC license. The latter is also know under the name CC+.
John Willbanks – Vice President of Science Commons – is next. His presentation is a very quick (about 10 minutes) overview of the Science Commons initiatives derived from Creative Commons. He notes that science has quite different needs compared with culture. According to Willbanks Science Commons is about rights such as reference, extending and integrity. Science Commons aims to make the already existing databases full of scientific data interoperable and more accessible so data can be easily shared. The presentation Willbanks gave made it hard to grasp what exactly Science Commons was doing and how developers or those interested in Science Commons could use or contribute to it.
Finally Asheesh Laroia – Software Engineer at Creative Commons – gave a rather minimal presentation about lib license, the software built upon the Adobe XMP SDK for reading and writing license information in a wide range of media formats. Without any examples or demo it was quite hard to get an idea of the benefits using this library. A pity since I happen to know what it can do and how it may save precious time.
After a short break the summit continues with a panel consisting of Gunar Penikis, Lucas Gonze and Stephen Lau on digital asset management on the web and desktop.
Gunar Penikis – Product Manager for Adobe’s Extensible Metadata Platform (XMP) – talks about XMP, an open, standards-based technology for the capture, preservation, and interchange of metadata across digital media. XMP makes it easy to keep track of digital assets while not getting into the way. It allows to write and read metadata in a wide range of media formats and is integrated in the Adobe applications making it easy for creatives to keep track of digital assets. Creative Commons’ Liblicense makes use of the Adobe XMP SDK which is licensed under the BSD license.
Stephen Lau – Developer Evangelist at Songbird – talks about integrity and the different types of metadata. He makes a distinction between local subjective metadata, such as your favorite songs and the global objective metadata such as the artist of a song. Songbird needs to deal with both and show this to the user, while also keeping into account the integrity of the data. Interestingly Songbird allows to parse microformatted data and can combine this with the metadata already known into the file.
Lucas Gonze – ex-Yahoo Music – talks about the web of songs in which every song has its own unique single url. His presentation was more of a philosophical and visionary nature. If every song would be retrievable using the web and “music could become a first world region of the web”. This could be beneficial for both artists, labels as well as music lovers. It reminded me of the whole short links ecosysteem in use by Twitter and mobile applications. Where due to the limited amount of characters in a text message one needs a as small as possible link. One could easily use a similar system to make music behave more in a web-like manner.
After the panel Mike Linksvayer – Vice President of Creative Commons – gave a talk titled “Digital Copyright Registry Landscape”. In it he gave an overview of the (perceived) need for registries in the digital domain and the existing solutions and future challenges. He starts with an interesting historical quote from the Creative Commons FAQ shortly after it was launched:
“Is Creative Commons building a database of licensed content? Absolutely not. We belief in the Net, not a centralized, Soviet-style information bank controlled by a single organization.[...]”
Now Creative Commons has organized a summit to discuss just the possible uses of a database like this. It illustrates the change of heart Creative Commons has made towards copyright registries. For now Linksvayer gives some possible arguments why registries might be useful and why they want to explore the copyright registries concept further:
- Dealing with orphaned works
- Dealing provenance, ownership and authenticity of works
- “Eat your own dogfood” type of proof of the CC developed technology which should allow you to create a copyright registry.
The demand for registries more or less confirms this since the list (UGC upload filtering, license management, media organization, collective rights management, cultural heritage, tracing content location and timestamping) Linksvayer sums up, mainly consists of identification issues such as ownership, provenance and authenticity of content.
He also described several types of registry with examples: built solely as a registry (Registered Commons, Safe Creative), built upon an existing data such as an archive (Open Library, Musicbrainz), internally or needed to offer others as a service (NoAnk Media, not sure how this relates to the first type of registry) and as a side effect of a different type of service for instance a metadata database (Jamendo, Attributor). The examples in the text above also gave presentations later that day.
Linksvayer continues with some of the challenges a registry faces:
- Reliable identify works
- Reliable identify owners
- Namespace monopolists (Identifiers are only obtainable for a fee)
- Making it webby
- Benfits VS Costs (who pays and how much?)
- Scams
- Metacrap (incomplete or wrong metadata)
Next to these challenges are also the challenges that deal with supporting the commons such as making it interoperable and semantic web enabled, open services using open standards and free software and dealing with public licenses. Linksvayer summarized it in my view perfectly with the last slide in which he states that the Web is “the” registry and asks the oncoming presenters of the different registries: What does your “registry” add to the web? After Mike’s talk set the context for the rest of the afternoon in which some of the registries mentioned in his talk presented themselves to the audience.
Devon Copley – CTO of Noank Media – was the first speaker to talk about their type of registry. They want to solve the problem (at least seen from the perspective of content holders and creators) of ‘unlicensed P2P‘ and get creators paid for their work. Copley states that ‘unlicensed P2P’ benefits nobody. Not the creators/content holders, not the ISP‘s nor the end-users. A bold statement, but his arguments such as “unpopular content is hard to find” and “download performance is poor as ISPs restrict P2P bandwidth” or “ISPs want to promote green isp services with licensed content ” to fund this statement are in my opinion questionable and weak. The only ones ‘suffering’ from ‘unlicensed P2P’ are the content holders / creators. He goes on to present their solution which is to use ISPs as gatekeepers which end-users pay to gain access to content. The ISPs will distribute the revenue back to the content holders. Since the problem only exists for 1/3 of the stakeholders I doubt this model will solve the problem. After this he goes into more detail on how to support this ‘solution’ using their platform.
Next is Robert Kaye – Musicbrainz –
Musicbrainz started as an alternative to Gracenote. Musicbrainz is all about music metadata. No metadata no findability. No good metadata and the data does not exists. Do it wrong and you get metacrap and you’ll get into liability trouble. For a copyright registry metadata is paramount, without metadata you cannot find the content. Web of data.
Joe Benso – Business Developer at Registered Commons –
Registered Commons is a content registry where companies can verify content for commercial purposes. Finds transparency important for a registry. Free service to register works of any type. Timestamp service (using A-cert). Ca cert is used for user trust certification as the highest level and email verification is the lowest form of trust verification. They also allow users to add limitations on the usage of the work by using the moral rights. Service for new business models.
Javier Prenafeta – Safe Creative –
Safe Creative is similar to Registered Commons and also attempts to solve the questions regarding ownership of a work, the license of a work and changes in the work. Unfortunately the speaker is not a native English speaker and is somewhat hard to understand. As far as I could understand they are using a similar system as Registered Commons. Both operate as a register which can lower the possibilities for copyright issues and thus liability sometime which came up a lot during this summit.
Rich Pearson – Attributor.com –
Attributor is monitoring registered content using crawlers. As far as I understood the difference between them and Safe Creative or Registered Commons is that they are not only a registry, but also somewhat of a watchdog for the registered content. So it allows them to identify new licenses, ads in relation to the content (and thus checking for license compliancy) and so forth. He ends his talk with a short summary about what a registry should contain and how this relates to their service.
Aaron Schwartz – Open Library –
Open Library is a project initiated by the Internet Archive. It’s a website with a page for every book ever published. Twenty million books indexed at this moment which can be changed in wiki-like manner. The interesting aspect of this project in my opinion is a side-project with CC in which they try to calculate the current copyright status of a book and so you can tell if it’s in public domain or not. They are also combining forces with Mediawiki so they can combine Open Library with Wikipedia.
Pierre Gerard – co-founder Jamendo –
Jamendo is a cc-licensed music sharing website similar to Simuze. They allow artists to upload their work and listeners to download music for free. They have a large amount of free music (approx 10.000 albums) and are backed by venture capitalist Mangrove Capital Partners (Skype). They share (the ad revenue with the artists 50%) and try to create partnerships with commercial entities. They act as a short of stock catalogue and also have to deal with registry like issues such as authenticity and ownership of a work and creator. At the moment they are still sorting this out.
Panel discussion:
I havent’t written everything down said during this session, but I tried to gather the bits that I found interesting. I’ve tried to gather the names of people including any links to them on the Web. In some cases I have forgotten or could not hear the name and thus no info is presented on these people.
Nathan Yergler starts with asking if the panel has been approached by content creators to supply Musicbrainz or any of the other panel members with metadata. Kaye states that most labels are not very good at keeping inventory and that their data in general it too crappy for his community. Swartz states that the book world is a lot better and that Open Library actually gets metadata from new books every week. Copley is asking Pearson about their platform and if they are using something which allows content creators to add information for commercial licenses such as the CC+ and how they have or are implementing something like this. They are interested in this, but are implementers and are not working on inventing something in this realm on their own. CC-rel might be a possible option according to Copley and it seems that Yergler agrees on this but remarks that it was not targeted at this particular use. From the audience there is a question regarding the use of Attributor in the academic world. Pearson states that they have not yet looked in the specific needs of the academic world.
Wendy Seltzer – Berkman Center – asks about the possibility of assurance in relation to the level of confidence on the copyright status of a work. Like a clearance service. Some entities would like to have this so they can be very sure that they will not be sued using the work. In other words liability comes up again, which was and seems to be a hotter topic in the USA than in Europe. Interestingly none of the mentioned projects by any of the panel members offers this service and they seem to feel a bit uneasy about this question and in fact do not seem want to offer such a service due to the high risks. In my view the use of a (commercial) registry becomes questionable if it cannot or is not willing to take this risk. Kaye seems to share this view and point this out, sadly there was no follow-up on this.
Longevity is a question posed by Riana Pfefferkorn. What if a registry disappears or goes bankrupt? What happens with the registry data? The more open ‘registries’ (Musicbrainz and Open Library) state to provide dumps and allow the data to be transferred by third-parties. Of the more commercial registries only Registered Commons answers that they have a record on file (paper) and Benso allows mentions their ties with the University but keeps this vague.
Brion Vibber – Wikimedia Foundation – is asking about the possibilities for having a feasible method which allows to make the distinction between almost certain all rights reserved material and content which may be freely shared. Since most content based fingerprinting is easily ‘circumvented’ using lossy formats and changing just a few bits due to for instance resizing he wonders what other options are available. The panel responds and states that the best algorithms seem all to be locked up in proprietary systems and the inner workings are not very well known.
Luis Villa asks Registered Commons about their use of moral rights in their system. Benso states that it basically allows content creators to add extra information or limitations to their content besides the license.
After this session there was a break which was followed by the ending session led by Ben Adida. In this last session in which there is a plenary discussion Adida poses the following questions for the audience to consider:
- What do we need out of copyright 2.0 / registry 2.0?
- What collaborative technology efforts are needed?
- What role should CC not play?
- What role should CC play?
Gunner – Adobe – states that copyright 2.0 is about trust. Personally I think he is right considering the ongoing remarks on authenticity, provenance ownership and in a certain way liability. Another speaker remarks that indeed trust is an issue. Followed up by another speaker in the audience adding that he also thinks that the element of trust is now being wanted due to the Industrial model in use at the moment. Instead he proposes to use the Safe Harbor model to deal with trust in a ‘webby’ like manner.
Hanno Kaiser advocates that the copyright system should be reformed towards its original goal of creating incentives, shorter terms, opt-in and change the default into a more pro deratives approach for all non-commercial use. Yet commercial use is difficult to define. Non-commercial should always be permitted. Ito adds that in Japan a bill has been proposed which suggest to do just this what is being advocated by Kaiser. In this bill all non-commercial use is allowed without permission and all works would be default licensed under a Creative Commons non-commercial license and thus it would require a fee to use the work for commercial purposes.
Brion Vibbor – Mediawiki – states that the commercial and non-commercial distinction is quite hard to make and might even cause issues unlike the use of open source licenses which do not make this distinction between commercial and non-commercial
Michael Carol – CC board – states that a copyright registry should not be merely a ‘property map’ but preferable a creativity map and let people use the registry to register their works even if they do not want to enter in to the commercial market space. Similar to the attribution requirement of the CC license or the ‘ego’ aspect of open source licenses.
Mark Graham – OER Commons – states that is still very important to reach out and let people outside the obvious circles get into contact with CC and its licenses and get educated about the possible uses. Louise Villa acknowledges this and adds that Creative Commons licenses alone or some platform is not enough to use the licenses. People need to be educated on the usage and pro’s and cons of CC licenses.
An audience member states that it is important to have a (open) standard for accessing these registries and make it easy to use them. There is some discussion on this point but it seems that there is no consensus on how to achieve this.
I had the chance to pose my question with regards to non-commercial vs commercial and how to define this. I’m not sure if people understood my point which was that commercial should be defined by CC, then all licenses should be non-commercial and add CC+ as extra option for the commercial use. In hindsight still will still not solve the definition issue with what exactly commercial is.
It was an interesting conference and I’m impressed by CCrel, but I still have a lot of questions with regards to copyright registries although I have to admit that there might be more use for it than I initially thought. Not only because of liability, authenticity, provenance or ownership but because of findability. As Robert Kaye – Musicbrainz – stated without metadata you can’t find the data and in my view a registry is nothing more than a big database full of metadata.
