Information Organization and Access

Table of Contents


Background of Study.

A Brief About Metadata and Related Terminologies.

About the organization..

Information Organization Tools used by NSDL.

Advantages and Disadvantages of the Information Organization Tools used by NSDL.



Introduction to Information Organization Research and Development Analysis

Critical analysis plays an important role in understanding and examining a particular task or an activity. It does not involve finding faults in a particular topic but involves steps taken to understand the quality of the same also critiquing the various topics or sub – systems which are a part of the same topic for which the critical analysis is being carried out (Morse 2015). It includes understanding the positives and negatives of the topic, the methodology used to carry out successful completion of the research and discussion of how the same research could be carried out in order to get an enhanced or a better result. The critical analysis undertaken in this report aims at understanding the steps taken by the National Science Digital Library to organize large packets of data in order to handle it in a more efficient manner without the loss of any data. The NSDL was the seed project by the United States’ National Science foundation (Fox, Gonsalves and Kipp 2002). The NSDL framework supports the reuse of metadata and enables an open source approach. With the help of this, it lets its users use the metadata for their own purpose and also allows external organizations to contribute to it (Park and Bui 2013). The report discusses in brief about the NSDL, its approach at handling data and the advantages and disadvantages of the methodology used by it.

Background of Study

Digital libraries are an advanced form of technology which is used to transform data in the form of text, images, charts, etc into their digital form. As per Candela, Castelli and Pagano (2011), digital libraries have formed a foundation and achieved a key role in making information available to everyone irrespective of different backgrounds in the knowledge society. With the aim of making information available to people from various resources ranging from research centres, schools, colleges and universities to museums they have made immense contribution in ease of data collection and interpretation. In short they have revolutionized the whole knowledge management lifecycle. Since online libraries have to deal a huge chunk of data in the form of texts and images, it becomes important to plan and establish an effective way to handle, store and present this data for long periods of time with minimal fear of data loss.

A Brief About Metadata and Related Terminologies

The internet, on daily basis generates huge chunks of data. In such cases it becomes important for most organizations to use or develop tools to handle and organize this data in an efficient manner. With innovation marking its presence in almost every sector, digital libraries have been a go - to tool for most people form researchers to students. With its inception in the 1990s, it has helped make information available to anyone anywhere with an access to the internet. Data collection and management can be very difficult for a physical library and it gets tougher when the data has to be managed digitally. Metadata is a terminology used in the same context which refers to the data about other data. In a generalised manner, it can be described as data which is used to describe other data which may be anything ranging from a webpage or a word document or any type of file. As per Chapple (2020), another approach of describing metadata would be think of it as a short description or summary of any data present on the internet. Another common term often associated with huge packets of data is a database. Databases are huge collections of data but in an organized manner. It is a collection of records which hold a common entity and connected by relationships (Onwuchekwa 2012). As per Wang (2020), big data helps innovate digital libraries services, provide a well – designed framework structure and enhanced visualization of data to the user. Digital Libraries hold huge amounts of data which when linked with data on the internet can generate benefits for its users (Ball 2018). Another concept used in digital libraries in deep learning. Whenever a user makes a search request, there are huge collections of data which are presented before the user which in turn makes the user experience poor. In such cases deep learning methodologies can be implemented which can effectively screen the data to be presented before the user. Data visualization play an important role in digital libraries. It is one of the major aspects while designing the framework for a digital library. Information Visualization can be described as the process of conversion of data into its visual form so as to make it understandable by its users (Liao, Gao and Yan 2012). 

About the Organization

The United States’ NSDL or the National Science Digital Library was founded in 2000 by the NSF or the National Science Foundation with the aim to create a hub which users could access to gain STEM knowledge which could be gathered from a variety of sources. It has been one of the most important sources in providing some of the most extraordinary results which are a part of some of the most exemplary research carried out about various departments ranging from science and technology to engineering and mathematics. It has provided some most supported documents of the NSF. It also provides tools and services that enhance the use of content provided by the library. By setting up the NSDL, the aim of the NSF to provide resources to its key users, teachers and students which could last a lifelong (Zia 2001).

Information Organization Tools used by NSDL

The NSDL from its inception through the years to 2003 focused on designing the architecture and the main portal for the NSDL. This responsibility was carried out by the Core Integration Team of the organization. It helped provide basic options such as browsing content. In the following years, the CI team or the Core Integration Team focused on creating different pathways for people with different backgrounds such as teachers and scientists. For this purpose, the team had to present a better architecture than the metadata - based framework. This was accomplished by categorizing the different data into nodes which were connected over a network. This NSDL information network was established in an open source software known as Fedora. A commonly methodology used by most of the digital libraries is different forms of metadata to help understand learning objects and running search engines for the website. The NSDL metadata repository is based in an Oracle enabled database. As per Lagoze, et al (2005), all single metadata repositories are stored in a collection of tables as individual Dublin Core metadata elements and as full XML metadata records. These records are collected from providers using the OAIP for the purpose of processing metadata. These protocols use a set of rules which help the addition of these harvests of metadata from the OAI provider. The ingest service of the metadata repository carry out harvests in set time intervals with the rate of changes in the collection as the frequency of these harvests. While carrying out these processes, all metadata repositories are checked for accuracy as per the metadata scheme. The services of the metadata repository include enabling the search engine which uses the Lucene full text indexing system and the text content of the webpage. As per Sun (2019), one of the most widely used full text search engines is the Apache Lucene. The Apache Lucene is a Java based library for full text search of documents. It can also be used in mobile applications by embedding them into android enabled apps and webpage backend servers. It is based on a pipeline structure which feeds on raw data. These pipelines are used to recognize regions of the marked text by using double quotations which is also referred to as dialogue (Sparling).

Both the services depend on the interaction with the metadata repository using the OAI – PMH to incrementally harvest new and up to date metadata records from the repository. OAI – PMH is used as short for Open Archives Initiative Metadata Harvesting. It is used by service providers to expose their metadata. A harvester is used to denote a client application which is then used to issue OAI- PMH requests. The harvester is operated or controlled by a service provider which provides a means of acquiring metadata from databases. These databases are linked networks which can be accessed by servers which can further solve up to six OAI – PMH requests and are handled by a data provider to present the metadata to the harvesters. The OAI – PMH then separates the three distinguished data sets related to the metadata made available by the OAI – PMH (Hornik, 2017).

The NSDL also uses Fedora as a storage system for the vast collection of data in various forms such as texts and images. The Fedora architecture is an extended set of lines of codes which are used for storage and management of complex objects and the link between them. It encapsulates the collection of local and distributed information in to separate modules known as objects and the relation between the objects and services. This in turn enables the modules or objects to have multiple entities some of which are generated through the process. The working of Fedora is based on a Resource Description Framework also known as the RDF. It is used to establish relationships between modules and their components. The architecture is based on a web service with every detail of object framework and other management functions exposed with SOAP and REST interfaces.

Advantages and Disadvantages of the Information Organization Tools used by NSDL

The NSDL makes use of OAI – PMH based frameworks for its search options and Fedora to handle the vast amounts of data and to serve as a storage management system. The framework helps make use of standard formats for the transfer of data. It also makes use of the Extensible Mark - up Language for the rectification of the information which is referred to the search made by the user. The OAI – PMH also enables the simplification of the process of logical compilation of the retrieved information. But on the other hand, the framework also has some disadvantages which weigh down the many advantages held by it. The framework is not based out of a complex search methodology which helps its users to filter out content depending on various terminologies such as the author of the content or the title or any other keyword. The only possible way to select data from a base enabled by such a network is by the means of the date the content was initially registered. Another disadvantage is the time delay associated with such a framework. The time utilized by the harvests can be long if the results associated to the search made is high. Moving ahead in number of results after an initial ten thousand results, the harvest time can take up to hours and if the results sum up in millions it could even take up to days to publish all results together (Garcia, et al. n.d.). On the other hand, Fedora which serves as the storage management system for the NSDL is a better and user - friendly storage management system. It also helps in representing a single point of information about the storage of files and data in the system without prompting the user consider using multiple tools.

Conclusion on Information Organization Research and Development Analysis

NSDL has been one of the most popular information centres for many people with various backgrounds. It was conceptualized with providing and improving the condition of education. Apart from providing search options and enabling its users to access information, it also needs to setup proper resources which can handle, store and present data up on request. The NSDL framework is based on two major software frameworks namely the Lucene search engine and Fedora storage management system. The Lucene is a software framework created by Apache based on the OAI – PMH architecture. This architecture is used to handle the search options provided by the website. It uses a method of harvesting to cater the search made by a user. On the other hand, the methodology used by NSDL to handle the vast storage system is the Fedora storage system management. It is a user - friendly storage system which is used to form relationships between information which are divided in to modules known as objects. Using these as key methodologies, the NSDL has successfully made information available to each and every person irrespective of their backgrounds.

existing assignment order?