Skip to main contentdfsdf

  • Feb 22, 12

    The OpenAmplify Web Service is the first and only of its kind. It exposes, via an open API, 250 man-years of development effort in a web service based upon more than a dozen granted patents. OpenAmplify simply does a better job of surfacing the meaning of web content, at massive scale and speed. Here's an overview of the thinking and technology that makes it possible.

  • Feb 16, 12

    TUT is a project for the development of a collection of morphologically, syntactically and semantically annotated Italian sentences; it includes:
    the definition of a native representation format (i.e. TUT format), which is dependency-oriented and aims at capturing the richness of the predicate-argument structure, i.e. a crucial layer of representation for several NLP tasks, such as parsing Information Extraction, Machine Translation and Question Answering.
    the conversion in Penn Treebank, in other constituency-based formats and in a format based on the Combinatory Categorial Grammar (see the TUTtoPENN converter web page and the CCG-TUT web page), which increases the possibilities of comparison/evaluation and portability of the resource.

  • Feb 16, 12

    Penn Treebank II tag set

    Pattern and MBSP assign meaningful tags to words and groups of words in a sentence. Each tag is a short code (such as "DT" for "determiner").

  • Feb 16, 12

    Penn Treebank II Tags

    Note: This information comes from "Bracketing Guidelines for Treebank II Style Penn Treebank Project" - part of the documentation that comes with the Penn Treebank.

    Contents:

    Bracket Labels
    Clause Level
    Phrase Level
    Word Level
    Function Tags
    Form/function discrepancies
    Grammatical role
    Adverbials
    Miscellaneous
    Index of All Tags

  • Feb 16, 12

    The tagset used in tagging the demo corpus available here is the Penn Treebank Tag set, described for example in Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz: Building a Large Annotated Corpus of English: The Penn Treebank, in Computational Linguistics, Volume 19, Number 2 (June 1993), pp. 313--330 (Special Issue on Using Large Corpora). The tagging was done at UPenn. The following part-of-speech tags are used in the corpus:
    1. CC Coordinating conjunction
    2. CD Cardinal number
    3. DT Determiner
    4. EX Existential there
    5. FW Foreign word
    6. IN Preposition or subordinating conjunction
    7. JJ Adjective
    8. JJR Adjective, comparative
    9. JJS Adjective, superlative
    10. LS List item marker
    11. MD Modal
    12. NN Noun, singular or mass
    13. NNS Noun, plural
    14. NP Proper noun, singular
    15. NPS Proper noun, plural
    16. PDT Predeterminer
    17. POS Possessive ending
    18. PP Personal pronoun
    19. PP$ Possessive pronoun
    20. RB Adverb
    21. RBR Adverb, comparative
    22. RBS Adverb, superlative
    23. RP Particle
    24. SYM Symbol
    25. TO to
    26. UH Interjection
    27. VB Verb, base form
    28. VBD Verb, past tense
    29. VBG Verb, gerund or present participle
    30. VBN Verb, past participle
    31. VBP Verb, non-3rd person singular present
    32. VBZ Verb, 3rd person singular present
    33. WDT Wh-determiner
    34. WP Wh-pronoun
    35. WP$ Possessive wh-pronoun
    36. WRB Wh-adverb

  • Feb 16, 12

    Just as a noun functions as the Head of a noun phrase, a verb functions as the Head of a verb phrase, and an adjective functions as the Head of an adjective phrase, and so on. We recognise five phrase types in all: 

     

    Phrase Type

    Head

    Example

    Noun Phrase Noun [the children in class 5]
    Verb Phrase Verb [play the piano]
    Adjective Phrase Adjective [delighted to meet you]
    Adverb Phrase Adverb [very quickly]
    Prepositional Phrase Preposition [in the garden]

  • Feb 15, 12

    A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. You can try out our parser online.

  • Feb 23, 12

    open source software capable of solving almost any text processing problem

  • Kea

    Feb 23, 12

    Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful for a wide variety of purposes. The task of assigning keyphrases to a document is called keyphrase indexing. For example, academic papers are often accompanied by a set of keyphrases freely chosen by the author. In libraries professional indexers select keyphrases from a controlled vocabulary (also called Subject Headings) according to defined cataloguing rules. On the Internet, digital libraries, or any depositories of data (flickr, del.icio.us, blog articles etc.) also use keyphrases (or here called content tags or content labels) to organize and provide a thematic access to their data.

1 - 10 of 10
20 items/page
List Comments (0)