Hitech Solutions

SINCE 2004

home
login

0 Item in Bag

Your Shopping bag is empty

VIEW/EDIT BAG

CHECKOUT

Notice

ALL COMPUTER, ELECTRONICS AND MECHANICAL COURSES AVAILABLEâ€¦. PROJECT GUIDANCE SINCE 2004. FOR FURTHER DETAILS CALL 9443117328

Projects > COMPUTER > 2017 > NON IEEE > APPLICATION

Facilitating Document Annotation Using Content and Querying Value

Abstract

A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.

Existing System

Annotation strategies that use attribute-value pairs are generally more expressive, as they can contain more information than un typed approaches. A recent line of work towards using more expressive queries that leverage such annotations, is the â€œpay- as-you-goâ€ querying strategy in Dataspaces: In Dataspaces, users provide data integration hints at query time. The assumption in such systems is that the data sources already contain structured information and the problem is to match the query attributes with the source attributes. Many systems, though, do not even have the basic â€œattribute-valueâ€ annotation that would make a â€œpay-as-you goâ€ querying feasible. Annotations that use â€œattribute-valueâ€ pairs require users to be more principled in their annotation efforts. Users should know the underlying schema and field types to use; they should also know when to use each of these fields. With schemas that often have tens or even hundreds of available fields to fill, this task become complicated and cumbersome. This results in data entry users ignoring such annotation capabilities.

Proposed System

CADS (Collaborative Adaptive Data Sharing platform) was proposed, which is an â€œannotate-as-you createâ€ infrastructure that facilitates fielded data annotation. The goal of CADS is to encourage and lower the cost of creating nicely annotated documents that can be immediately useful for commonly issued semi-structured queries such as the ones. Our key goal is to encourage the annotation of the documents at creation time, while the creator is still in the â€œdocument generationâ€ phase, even though the techniques can also be used for post generation document annotation. In our scenario, the author generates a new document and uploads it to the repository. After the upload, CADS analyzes the text and creates an adaptive insertion form. The form contains the best attribute names given the document text and the information need (query workload), and the most probable attribute values given the document text. The author (creator) can inspect the form, modify the generated metadata as- necessary, and submit the annotated document for storage.

Architecture

goto projects

FOR MORE INFORMATION CLICK HERE