Specification of RCache Protocol Used by Fishweb

Protocol URI
http://purl.oclc.org/NET/WASABI/itemstore/

The RCache protocol (RCP) is used by the fishweb data store to enable management of a collection of records.

RCP uses HTTP for transport.

RCP response messages are JSON encoded objects.

An RCP server may support subsetting operations by querying an index of the content. Subsetting applies to the List and Retrieve operations.

The collection may contain any content, and basic metadata is recorded about the content as it is inserted or modified in the collection.

Operations

The following operations are defined:

(1) Operation available only for indexed collections.

Create

Creates a new item in the RCP service. If the connected user does not have write permission on the collection, then a HTTP 403 error is returned.

Method
POST
Target
http://<server>/<collection>/
create_response <- create(content, guid, type, origin=None, doindex=1)
content (required)
The payload to be inserted to the collection. This is treated as an opaque object by both the client and server, though additional operations may be performed by either client or server (pre/post processing), these operations are not defined in this specification.
guid (required)
The globally unique identifier for the object. An error is returned if the specified GUID already exists within the RCP collection.
type (required)
The object type of the supplied item. This is used to label items within a collection as being instances of a particular type of data. It is recommended that URIs are used to identify object types.
origin (optional)
A URL pointing to information about the origin of the item. If not specified, then the URL of the RCP service collection will be used.
doindex (optional)
RCP servers may have an indexer attached, this parameter provides a hint that the server may use to delay indexing of the item (can be useful when a large volume of content is being uploaded). The server can choose to ignore the parameter. A server that does not support indexing must ignore this parameter.
create_response
A JSON encoded instance of a response object.
response = {'guid': '<Full URL pointing to object>',
            'error': {'message': '<Error message>',
                      'code': '<Error code>',
                      'info': '<Additional information about the error>'},
            'message': '<Optional message with additional information>'
           };
Both 'guid' and 'error' are optional, but at least one will be in the response. The 'message' entry may contain any JSON encoded information to be passed back to the client.

Get

Retrieves a single item from the collection. The response contains the actual content (bytes) of the item. Store specific metadata about the item is retrieved using the getmeta() operation. If the specified item does not exist in the collection then a HTTP error 404 is returned. If the connected user does not have read permission on the item, then a HTTP 403 error is returned.

Method
GET
Target
http://<server>/<collection>/<guid>/
get_response <- get(guid)
guid
The GUID of the item to retrieve
get_response
The item

Getmeta

Retrieves collection specific metadata about the item identified by GUID. The metadata provides attributes similar to those of a file system store and are described in getmeta_response. If the specified item does not exist in the store, then a HTTP 404 error is returned. If the connected user does not have read permission on the item, then a HTTP 403 error is returned.

Method
GET
Target
http://<server>/<collection>/<guid>/meta/
getmeta_response <- getmeta(guid, varname=None)
guid
The GUID of the item for which metadata is to be retrieved.
varname
If set, then the JSON response will be set to that variable name, otherwise just the structure is returned.
getmeta_response
Item metadata. Response types of RDF and JSON are supported and selected by the Accept: header sent by the client.

JSON encoded response:

response = {'metadata':
             {'identifier':'<DC identifier (GUID) of item>',
              'type':'<DC type (otype) value of item>',
              'source':'<DC origin of item>',
              'creator': '<DC creator (uid) of item - use uri pointing to member collection>',
              'created':<DC UTC time item was first inserted into store>,
              'modified':<DC UTC time item was last updated>,
              'permissions':<integer permissions flag>,
              'bytesize':<integer size in bytes of content>,
              'hash': '<md5 hash of content>'
             },
            'error': {'message': '<Error message>',
                      'code': '<Error code>',
                      'info': '<Additional information about the error>'},
            'message': '<Optional message with additional information>'
           }

RDF encoded response:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/terms/"
         xmlns:rcp="http://fishnet2.net/vocab/">
  <rdf:Description rdf:about="http://localhost:8000/items/.KU.KU%20Fish.1004/">
    <dc:identifier>.KU.KU%20Fish.1004/</dc:identifier>
    <dc:type rdf:resource="http://purl.oclc.org/NET/WASABI/darwincore/ /">
    <dc:creator rdf:resource="http://localhost:8000/members/1/" />
    <dc:created>2008-06-01T10:42:11.0Z</dc:created>
    <dc:modified>2008-06-01T10:42:11.0Z</dc:modified>
    <rcp:bytesize>1234</rcp:bytesize>
    <rcp:permissions>210</rcp:permissions>
    <rcp:hash>33ee40b2e431dda458247005355bea14</rcp:hash>
  </rdf:Description>
</rdf:RDF>

(Note: investigate overhead requirements for adding a read count for each item).

Example:

$ curl "http://localhost:8000/items/.KU.KU%20Fish.1004/meta/?varname=test" \
 -H "Accept: application/json"

test={"origin": "http://digir.nhm.ku.edu:80/digir/DiGIR.php", 
      "bytesize": 2761, 
      "modified": 54643.9320763, 
      "chash": "33ee40b2e431dda458247005355bea14", 
      "otype": "DwC", 
      "created": 54643.9144452, 
      "guid": ".KU.KU Fish.1004", 
      "permissions": 210}

Update

Updates an existing item with new content. The otype or origin of an item can not be changed without deleting the existing instance and creating a new one. An error (HTTP 404) is returned if the item with the specified GUID does not exist in the collection. If the connected user does not have write permission on the item, then a HTTP 403 error is returned.

Method
PUT
Target
http://<server>/<collection>/<guid>/
update_response <- update(guid, content)

guid
The GUID of the item for which content is to be updated.
content
The content of the item that will replace the existing content.
update_response
A JSON encoded response object.

response = {'guid': '<Full URL pointing to object>',
            'hash': '<md5 hash of object (used for change detection)>',
            'error': {'message': '<Error message>',
                      'code': '<Error code>',
                      'info': '<Additional information about the error>'}
           };

Delete

Removes an item and its associated metadata from the collection. If the item does not exist, then a HTTP 404 error is returned. If the connected user does not have write permission on the item, then a HTTP 403 error is returned.

Method
DELETE
Target
http://<server>/<collection>/<guid>/
delete_response <- delete(guid)

guid
The GUID of the item to be deleted.

delete_response
A JSON encoded response object.
response = {'message': 'OK',
            'error': {'message': '<Error message>',
                      'code': '<Error code>',
                      'info': '<Additional information about the error>'}
           };

List

Retrieves a list of metadata entries for items in the collection. The entire collection operates in a manner similar to an indexed array, however there is no guarantee that the array indexes will reference the same objects between calls (as insert / delete operations may have occurred).

Method
GET
Target
http://<server>/<collection>/
list_response <- list(start=0, pagesize=1000, q=None)

start
Index of first value to retrieve
pagesize
The number of items to retrieve
q
Query that defines a subset of the collection. The syntax of the query is that of the Apache Lucene indexer.
list_response
A JSON encoded list of getmeta_response structures.

response = [{'identifier':'<DC identifier (GUID) of item>',
              'type':'<DC type (otype) value of item>',
              'source':'<DC origin of item>',
              'creator': '<DC creator (uid) of item - use uri pointing to member collection>',
              'created':<DC UTC time item was first inserted into store>,
              'modified':<DC UTC time item was last updated>,
              'permissions':<integer permissions flag>,
              'bytesize':<integer size in bytes of content>,
              'hash': '<md5 hash of content>'
             },
             {}, 
             {}, 
             ... ]

Retrieve

Like List except the result is a set of items rather than metadata.

Method
GET
Target
http://<server>/<collection>/
retrieve_response <- list(start=0, pagesize=1000, q=None)
q
Query that defines a subset of the collection. The syntax of the query is that of the Apache Lucene indexer.
start
Index of first value to retrieve
pagesize
The number of items to retrieve
retrieve_response
A JSON encoded list of items.

TODO: Define JSON encoding rules. XML as a hierarchical dictionary, binary objects URLs.

Fields

Retrieves a list of fields from the index.

Method
GET
Target
http://<server>/<collection>/fields/

Field

Retrieves a list of distinct values for the specified field.

Method
GET
Target
http://<server>/<collection>/fields/<field name>/

Item Representation in JSON

XML

XML items are converted to a hierarchy of dictionaries.

xml:

<doc xmlns='http://default.name.space/' 
     xmlns:a='http://a.name.space/'>
  <a:item>
  some text
  </a:item>
</doc>

JSON:

Error: Failed to load processor javascript
No macro or processor named 'javascript' found