Technology choices and tools: narrative version

Short Introduction

In the context of the CIARD Content Management Task Force (CMTF), discussions have been held and a few activities have been planned for setting up an "Information Management Tools Wiki" (IMTW) on the AIMS website.
The objective of this Wiki is to support decision making in the choice of tools for managing information and making it available and accessible on the web.
The need for such a tool derives from the several requests FAO receives for recommendations of good tools and also the amount of money and resources spent on developing / outsourcing / customizing such tools by Institutions in member countries.
 
Implementation
The IMTW is planned as an extension of the current Registry of Toolsthat is part of the VEST registryon AIMS.
The tools in the IMTW will be assessed against several criteria: the exercise below is the first step in the identification of relevant requirements.

Notes for wiki editors:
- please only add, do not delete what others have written;
- you can add any comments by clicking on the "Add new comment" link at the bottom of the page.

toc_collapse=0;
Table of Contents 


  1. 1. INSTITUTIONAL CONTEXT AND INFRASTRUCTURE

    1. 1A. Key policies / strategies for information creation / sharing / dissemination

    2. 1B. IM/IT functional structure and Human Resources (roles and responsibilities, workflows)

    3. 1C. Funding / resources

    4. 1D. IT infrastructure: Deployment environment

  2. 2. SCOPE OF THE CONTENT MANAGEMENT TOOL

    1. 2A. Main function and content of the system

    2. 2C. Content management: description and organization

    3. 2D. Content dissemination

1. INSTITUTIONAL CONTEXT AND INFRASTRUCTURE

1A. Key policies / strategies for information creation / sharing / dissemination

  1. Does your Institution have (or plan to adopt) an Open Access mandate?
    If it does, you have probably already implemented an OA repository.
    If you plan to adopt an OA mandate, consider that the tool you are going to adopt has to allow you to comply with the OA requirements (see here).
    Important features that your tool must have:
    - Ability to provide access to full-text documents;
    - Ability to facilitate the self-archiving by authors/creators;
    - Ability to manage custom administrative metadata for the resources (like copyright information);
    - (If you plan to implement an OAI data provider) ability to implement the OAI protocol as data provider;
    - See also the preservation issue below. 
  2. Do you want to ensure the (digital) preservation of the contents you manage?
    (Digital) Preservation is essential for a good Open Access repository, but it is also important for any information system giving access to contents.
    To achieve this objective, consider the following features when you select an IM tool:
    - Ability to upload resources;
    - Ability to maintain stable URLs for resources and web pages;
    - Ability to use digital preservation standards (like OAIS);
    - Ability to set limitations to the file types that the system will ingest (you may have to migrate from some file types when they become obsolete). 
  3. Do you have copyright / licensing issues for the contents you disseminate?
    If you do, look for these features in your IM tool:  
    - Ability to manage custom administrative metadata for the resources (like copyright/licensing/access information);
    - Ability to filter viewing / downloading of resources according to the above administrative metadata.  
  4. Is your Institution engaged in partnerships and agreements whereby it commits to contribute data to specific service providers / engines / gateways?
    Examples of service providers that allow to search participating repositories/sources: AGRIS, AgriFeeds, CABI Abstracts...
    Participating in such initiatives requires the ability to produce data / metadata in the agreed format, with the agreed protocols.
    Some requirements you may want to consider for a suitable IM tool:
    - In general, in order to be able to contribute data to any service provider: ability to define custom output formats; 
    Possible specific requirements:
    - Ability to implement an OAI data provider;
    - Ability to expose data as RSS feeds;
    - Ability to customize your XML outputs and RSS feeds to specific schemas. 

1B. IM/IT functional structure and Human Resources (roles and responsibilities, workflows)

  1. Do you foresee a submission and editorial workflow for your information system?
    For instance, do you foresee different roles for creating a record and approving/publishing it?
    Do you also foresee a separate role for the selection of the material that should go into the system?
    Most IM tools provide a basic publication workflow where a record can be created and then published by another user, but for more complex workflows (e.g. different permissions for internal and external submitters; or a rejection and re-submission process) it is recommended that the IM tool has a flexible and customizable workflow configuration.
  2. Who is responsible for managing the contents and the system?

    If librarians or information managers are responsible both for the management of contents and for the design of the information system, some features that should be considered when selecting a tool are:
    - Usability: the tool should not be conceived for IT people;
    - Easy management of metadata and export formats: librarians and IM specialists are familiar with metadata and may also want to be able to adopt specific metadata sets and export in specific formats; they should also be able to configure the basic settings for OAI-PMH and for RSS feeds without advanced IT skills;
    - Compliance with library standards: librarians should be consulted to understand with which standards the tool needs to be compliant, e.g. which vocabularies and reference/authority lists.

    If IT staff or an IT Unit is responsible for the system, more advanced features become relevant:
    - Ability to add custom programming code that integrates with the platform without preventing further upgrades of the tool;
    - A robust and clear software architecture and good documentation of it;
    - A programming language and a server platform with which the IT staff is familiar.

    If "content authors" are directly responsible for submitting and maintaining contents (e.g. your system is mainly a communication platform and it will be mostly communications staff who will manage the system, or if researchers publish contents/articles directly in the system), it is important to consider the usability features of the tool:
    - User-friendly editing interface;
    - Not too complex input forms (metadata): content authors are not necessarily aware of metadata and library cataloguing conventions.

  3. Is some IT staff (permanently) available to the project?
    If there is no IT staff to support content authors or librarians, then it is also important that:
    - the tool comes with all the necessary features and configurations and doesn't require any tweaking;
    - the hardware and software infrastructure is hosted either with an external provider or in the cloud

1C. Funding / resources

  1. Is budget for dedicated hardware available?
    There may be various reasons why it may be necessary to purchase dedicated hardware (e.g. servers), for instance control of the technical environment, performance guarantees...
    If budget for this is available, even technology options that require powerful machines and/or specific server environments and settings can be considered.
    Otherwise, the tool can be hosted on an external server (with a partner organization or with a hosting company), but this gives less control over the technical infrastructure and may force you to restrict your choice to tools that run on the platform (and under the configuration) available on the external server.
  2. Is budget for software purchases available?
    Although software licenses are seldom a major cost in a project, they may be prohibitive if no budget is available. If budget is available, even proprietary paid software can be selected, although other costs beyond the license should be considered (paid upgrades, cost of new custom functionalities, especially for non-open-source tools, etc.). Otherwise, free open-source software offers the best option both in the short and in the long run.
  3. Is budget for dedicated staff available?
    This affects the selection of a tool heavily. Different tools require different skills, and certain skills are more expensive than others. In general, resources with generic IT / web authoring skills are less expensive than resources with computer programming skills, and in this second category PHP programmers and Mysql experts are less expensive than Java programmers, Oracle experts and system administrators who can set up a Tomcat web server with a custom configuration.
    Therefore, before selecting a tool, it is essential to check its technical features amd make sure that the budget available for the project can cover the cost of the staff needed.
    Some general considerations:
    - tools that can be installed on a standard LAMP environment do not require particularly advanced skills;
    - tools that allow to configure the whole system (contents, layout, users, workflows) from a User Interface do not need programming skills, although they may involve a more or less steep learning process;
    - "commodity services" in the cloud usually don't require advanced IT skills, although in some cases they may be better exploited by IT specialists.

    It is also important to consider if budget for dedicated staff is available only in the short term or also in the long term:
    - Some tools only require an initial customization by a programmer and can then be managed and maintained by non-programmers and are therefore better sustainable in time than those that need programming for implementing or modifying even simple functionalities.
    - Sime tools allow for better integration with other systems and for easy export of data, making it easy to port the system to a new platform if needed: this also has to be considered as a sustainability issue, as each porting of a system from a tool to another can be a very expensive exercise. 

1D. IT infrastructure: Deployment environment

  1. Does your Institution have a good deployment environment, with highly performing servers, good network connectivity and IT staff?
    If it does, you can opt for a "deployable" tool (a software tool that can be installed on any server), host it on your servers, fine-tune its performance and customize it according to your needs.
    Otherwise, other options that can be considered are:
    - Hosting the deployable content management tool on an external server;
    - Using a "content management tool" that is available as a commodity service in the cloud.

    If you are hosting a "deployable" tool, either within your Institution or outside, the following considerations apply for the hosting environment:

  2. Which operating systems and web servers are installed / can be installed on the hosting environment?
    This is relevant to the selection of a tool, as many tools are available only for specific web servers and need support for a specific programming language.
    As far as web applications are concerned, in general it is much easier to implement or find a hosting environment for PHP applications using a Mysql database then implementing an environment for Java or ASP or .NET applications using an Oracle or a Microsoft database. 
  3. What kind of IT support is available?
    Whether the tool is hosted within the Institution or externally, some IT support for the maintenance of the server environment is needed.
    Besides, some technical support for the deployment and upgrading of the tool is necessary, either internal or outsourced or provided by the hosting company.
    The higher the skills provided by the available IT support, the

    If you are opting for a "commodity service" or a tool in the cloud, the following considerations apply:

  4. Can your data be exported?
    Your data will be hosted in the cloud: make sure that the service provides easy export functionalities and is interoperable with other client tools that you may want to use to access those data.
     

2. SCOPE OF THE CONTENT MANAGEMENT TOOL

2A. Main function and content of the system

  1. What is the main function you foresee for the system you want to implement?
    Even if you are planning to integrate different types of contents (documents, blog posts, multimedia...) and different types of functionalities (community, collaborative editing), if is important to define what the primary function of the platform will be, because it may be difficult to find a tool that fulfills all requirements and you may have to prioritize and identify the essential requirements for your primary function and the requirements for your secondary functions.

    Managing contents
    This is essential in any kind of system. "Contents" can be articles, blog posts, web pages, documents, multimedia... There are certain operations that your IM tool has to be able to perform on these contents:
    - Ability to perform the basic data management functionalities: add/edit/delete, provide metadata, search, display,  upload external electronic documents or other files, and generating stable links to those files as part of the metadata
    - Ability to define categories / tags to better browse your contents
    - Defining attractive search and browse interfaces for end-users
    •    Generating and integrate help pages for end-users

    More specific functionalities:

    Managing documents
    - Compliance with library standards as concerns the cataloguing form (metadata) and the indexing criteria (by author, by subject...)
    •    Full text indexing of external electronic documents
    - Compliance with the most widely adopted bibliographic exchange standards (Dublin Core metadata, more advanced metadata standards like MODS, OAI-PMH protocol, RDF output using widely adopted RDF vocabularies)

    Hosting discussions / a community
    Besides basic content management, hosting discussions assumes authentication of users, and assigning privileges to them. Discussions may need to be organized in a variety of ways (by thread, entry date, author, keyword)

    Collaborative editing
    For any form of collaborative writing version control and user control are essential (who may change which part of the document, and who has actually made changes)
     

  2. Which types of contents do you need the tool to manage?

    Documents
    Managing printed documents may require library management functions (acquisition, circulation, lending, serials control etc. )  Which functions are required depends on choices within the organisation. For electronic documents there is a choice to be made with regard to file formats (PDF is probably the most common format at the moment, are word processor files or presentation formats also allowed?)
    Compliance with appropriate bibliographic metadata standards is important.

    News / events
    News items are essentially short web documents with a specific structure and layout (for example a data range, a location and a link for more information). For the end-user interface an expiry date may be required after which the item is no longer visible, and an embargo data before which the item is not yet visible.
    If you plan to manage such contents in your system, it is essential to be able to expose them as RSS feeds.

    Multimedia
    On top of  requirements for any sort of electronic file there may be a need to offer the end-user a form of online streaming of the video or audio files.

    Contact information / directories
    Contact information needs very frequent checking and updating as organisations and persons changes their coordinates frequently. The system should allow systematic checks and changes.
    It is also important that the system allows to customize the metadata

    Heterogeneous datasets
    Surveys  and other forms of research may result in compound datasets.  Datasets of such data in appropriate formats (CSV for spreadsheets / tabular data , formats for statistical packages like SPSS, NETCDF for multidimensional arrays) are distributed together with documentation files (describing the different files and the way the data was collected). Datasets are often distributed embedded in a compressed file to save space and bandwidth, and to keep the files together
    Project information
    Project information itself comes with a number of structured elements (e.g. start and end dates) and a number of narrative elements (abstract, objectives) Project information systems should be able to link to contact information and to collections of documents or other outputs of the project.

    2B. Content collection: quantity, processing, sources and flows

  1. How many records do you expect your tool to manage?
    ...
    Current number of objects      
    Estimated growth      
    A  realistic estimate  expected size and growth is a good starting point for any discussion about functionalities that are required. For example:  if there relatively little items in a collection a search request may often resuilt in a zero hits response. To avoid such frustration a browsing interface is more appropriate
    ...
  2.  
  3. Only digital contents or also contents to be digitized?
    Digitization workflows require that the same intellectual unit can be tracked through different stages of the digitization process (printed original, printed original prepared for scanning, scanned image files, optical character recognition (OCR))
  4. Only new contents or contents to be imported?
    If (legacy)  metadata needs to be imported the system should be able to import popular formats that can be easily [produced (like comma or tab delimited, custom XML etc.)
     
  5. Also dynamic content from other sources?
      
    If the system is expected to integrate dynamic content from other sites, regular import procedures and filtering procedures need to be defined.
      - RSS aggregation    RSS (different version) is the most commonly used XML format for news in any form  
      - other XML imports      

      - Linked Data consumer    
    Linked data, i.e. different formats of RDF serializations (JSON, Turtle, RDF/XML) are likely to become the next generation key exchange format
     

2C. Content management: description and organization

  1. Do you need to use your own custom classifications / authority lists?
    If  your system uses controlled vocabularies it may be necessary to check input against such authority files. 
  2. Do you need to use specific external vocabularies (e.g. Agrovoc) / authority lists?      
    Such external authority data may come as web services, an API or a SPARQL engine. To use those data the tool should be able to “consume” such services....

2D. Content dissemination
An information system should be able to disseminate its content in formats that are appropriate for specific audiences. However the relationship between audience and delivery formats is a complex one. Market forces may result in rapid changes and geographic variations. For example handheld devices may be important as a way to access information systems all over the world. Presently handheld devices in developed countries are smartphones that can display web pages. Mobile phones are also popular in many developing countries but devices and networks do not support smartphone functionalities. SMS is an appropriate format to make information accessible in such an environment.

Methods of dissemination for consideration (this list is not exhaustive)

•    Web search / browse 
An important question is whether web users are expected to visit the website, or whether they are more likely to search for this specific content in generic web search engines. In the former case investments in an attractive web interface are justifiable. In the latter case it makes more sense to invest in search engine optimization to get the context indexed there  getting   For some remarks about the appropriateness and search or browse interfaces see 2.B.1
•    Web pages optimized for low bandwidth environments     
•    Optimized for mobile devices / Smart phones
•    RSS feeds     
RSS is the glue between many different web applications; RSS feeds can be embedded in external web pages, or can be used by commodity services to create e-mail notification systems
•    Basic XML output (e.g. Dublin Core)     
Specific protocols may use standardized XML formats that may be built in a Content Management System. For example the harvesting protocol OAI-PMH by default expects data providers to provide items in a simple Dublin Core XML format. The Sitemap protocol for search engine optimization requires a site to produce XML files in a specific format.
•    Custom XML exports (e.g. Agris AP, MODS...)    
  If you need to export information tot systems that require outpuit in more specific XML formats the tool should be able to let you customize XML output formats.
•    Linked data output 
The next generation web applications will make use of Linked Open Data / Semantic web technologies. These technologies encode information in “triples” that can be serialized in different formats: embedded in HTML or XML pages (RDFa)  or as Json / Turtle / RDF/XML etc. files. These files can be exposed as static files or generated as a response to a query (SPARQL endpoint)
•    Notifications (e-mail, SMS, print)
For many audiences these formats are most appropriate to access the latest content from a system

 
Authors: 

valeria.pesce

hugo.besemer

© 2007 - 2020 Valeria Pesce
Twitter icon
Facebook icon
LinkedIn icon
Del.icio.us icon
StumbleUpon icon
Digg icon
Reddit icon
Technorati icon