The Kainoki Treebank – a parsed corpus of contemporary Japanese

1 An invitation to the user

What can you do with a parsed corpus that you couldn't do without? Let's say you wanted to see all the things that horses are depicted as doing in Japanese texts. You want to find all the predicates in relation to which 馬 is a subject (or part of a noun phrase subject formed on 馬). Now, along with expressions that are fairly reliably subjects such as 馬が, there are other expressions that may or may not be subjects: 馬は, 馬も, 馬さえ, 馬なら, 馬に, 馬から, 馬の, etc. Furthermore you know that there are sentences such as (1) where 馬 is the subject of three verbs, but only appears local to one of them.

(1): 馬が追い手を振りきって、走って、山へ逃げた

You know there might be pronouns that refer to previously mentioned horses, and you would like to look at the predicates that are associated with these pronouns. On top of this, there are expressions like (2) which depict horses as things that whinny and (3) where horses have white tails, but the status of 馬 as subject is not directly inferable by reference to particle marking or word order alone.

(2): 鳴く馬

(3): しっぽが白い馬

With a parsed corpus, you can find all the predicates in relation to which 馬 is a subject (or part of a subject noun phrase formed on 馬, or a subject pronoun co-referent with an antecedent occurrence of 馬). This is achieved by referring to specifications of grammatical role (i.e., subject), syntactic structure (e.g., does the subject relation hold from a single structural layer, or because there is inheritance through layers of subordination and coordination), and annotation for semantic relations (e.g., anaphoric dependencies established across multiple sentences).

Another example: Let's say you are interested in expressions like (4) in sentences such as (5).

(4): 笠を手に

(5): 笠を手に出かけた

Expression (4) is interesting because in the sentence there seems to be a missing verb (like 持って) that mediates the relation between 笠を and 手に. Expression (4) is especially interesting because (unlike expressions such as 「料理を中心に」or 「首都を皮切りに」) the relation between 笠を and 手に cannot be paraphrased as 「A を B として」. With a parsed corpus, you can search for all the subordinate clauses (/^IP-ADV/) that

contain an accusative object (< /^PP-OB1/)

ii.

a postpositional phrase headed by に (< (/^PP/ < (/^P-ROLE/ < に)))

iii.

nothing else (!<3 __)

This is achieved by the search pattern of (6).

(6): /^IP-ADV/ < /^PP-OB1/ < (/^PP/ < (/^P-ROLE/ < に)) !<3 __

Click the ‘Try’ button to see results from the corpus.

These are just a few examples of what you can do with a parsed corpus, but couldn't do with a search engine in a browser or a concordancer. This is made possible by human analysis of the meaning of sentences, and organisation of the data based on a limited number of abstract principles. There are many questions about Japanese grammar that haven't been explored systematically, but are amenable to investigation through a parsed corpus. This is because a parsed corpus analyzes everything, rather than just the things that linguists have already identified as interesting. We invite you to explore and see what you may find.

2 String searches

In order to use the corpus effectively, you need to know in general terms what is inside and how it is organized. One simple way to access information without needing any background information beyond a general knowledge of Japanese is the “Basic string search” interface in the corpus, and its related functions. This is a very powerful tool which has features that simplify searches and make extensive use of wild cards unnecessary (unlike some other search engines). For example, you can easily search for a string that does not actually form a word in Contemporary Japanese, but nonetheless appears in texts as combinations of parts of words. Anyone who knows Japanese can say that the contiguous string 「にはい」does not spell a simplex word. A search that looks for a segment of that precise shape (that is, <wb>にはい<wb>, where “<wb>” indicates a word boundary) will yield nothing. (This search can be done by applying the ‘Strict’ option for string searches to the search expression にはい.)

But the string にはい does appear in combinations of parts of words (as long as those parts are written in ひらがな). For example, the ‘Character’ string search option looks for <wb>に||は||い<wb> (where “||” indicates that a word boundary is possible in that position.) This yields

「家<wb>に<wb>は<wb>い<wb>ない」 (家には居ない).

The ‘Liberal’ string search option allows word boundaries to be ignored when searching for a string, so it looks for ||に||は||い||. Added to the results from the ‘Character’ search, this option yields examples like these:

「簡単<wb>に<wb>は<wb>いか<wb>ない」(簡単には行かない)
「エジプト<wb>に<wb>はいろ<wb>う<wb>と<wb>して」(エジプトに入ろうとして)
「<wb>わけにはいかない<wb>」
「屋根<wb>に<wb>はい上がる」(屋根に這い上がる), etc.

Results can be specified to a certain extent by using the ‘Mine’ string search option. This allows you to introduce word boundaries into the string by typing a single space (symbolized here with an underscore “_”). For example, entering に_はい under this option looks for ||に<wb>はい||, yielding examples with verbs 入る and 這い, but none with verb 居る or 行く.

Results can be specified still further by choosing the ‘Strict’ option for string searches, which enforces word boundaries at the edges of the string. For example, entering に_はい under this option looks for <wb>に<wb>はい<wb>. This yields no results at present because there are no isolated words 肺, 灰, 杯, or 這い in the corpus (at least none written in ひらがな) that immediately follow に. Such a result also tells us that 這いあがる is one segment (that is, the whole sequence is analyzed as a single lexical item, rather than as two words). This is useful information: While productive auxiliary verbs following i- and e- combining stems (始める, 終わる, 出す, 止む, 忘れる, etc.) are treated as separate words, there are also sequences of verbs in i- and e- combining stem form in lexical compounds with other verbs: 這い上がる, かき集める, 言いふらす, 撒き散らす, 追いかける, etc. For these latter forms, there is no word boundary between verb 1 and verb 2.

Results can be expanded using the Greedy string search function in combination with the ‘Liberal’, ‘Character’, ‘Mine’ and ‘Strict’ options. Greedy string search enables you to specify the upper limit of intervening characters that may appear at a word boundary site. For the string search に_はい, using Greedy string search in combination with the ‘Character’ string search option and setting the intervening character number to a value of 1 will yield examples containing things such as

「間には悪い感情」,
「基本的に鬼は怖いもの」,
「調べるのには向いていない」,
「活動には狙いが二つ」,
「チリにはない考え方」,
「おいしいにはおいしいけれど」, etc.

Setting the intervening character number to a higher value yields even more results.

Greedy string search interacts with the string search options in the following ways:

With the ‘Liberal’ option for a string A_BC, Greedy string search set at n allows n characters to intervene between A and BC (with no word boundary restrictions for A and for BC).
With the ‘Character’ option for a string A_BC, Greedy string search set at n allows n characters to intervene between A and BC (with word boundary restrictions after A and before BC).
With the ‘Mine’ option for a string A_BC, Greedy string search set at n allows n characters to intervene between A and BC, (with no word boundary restrictions before A and after BC).
With the ‘Strict’ option for a string A_BC, Greedy string search set at n allows n characters to intervene between A and BC, (with word boundary restrictions for A and for BC).

The principles by which Basic string search and Greedy string search operate for Kanji/Kana text do not change when the corpus is accessed in the Roman alphabet.

There are limitations to the string search: (i) the results are limited to a particular orthographic form (e.g., うま, ウマ, and 馬 all give different results); and (ii) searches based on lemmas are not possible (e.g., while expressions あります, あれば, ある, and あった all contain a morpheme with the same lemma, 有る, but to get results for all attestations of the the word in question, the various forms must each be searched for separately: あり, アリ, 在り, 有り, あれ, アレ, 在れ, 有れ, etc.).

Accordingly string searches will yield complete results for sentences containing a specific string, but string search is most useful in cases when either i) the user doesn't know how a string is segmented, or ii) the user doesn't know what structural assignments are given to elements in a sequences of words. For example, if you don't happen to know that in the corpus the string ならでは is marked up as a single word, searching for any of なら, らで, では, ならで, らでは, or ならではの using the ‘Liberal’ option will yield ならでは in the results. If you don't know that the string に向けて is marked up in two different ways, depending on the grammatical functions involved, using the ‘Liberal’ option with that string will yield the pattern (P-ROLE に向けて) and also the pattern (P-ROLE に) (VB 向け) (P-CONN て). (The significance of the patterns is discussed in section 3 below.

In sum, any well-formed string search that has a corresponding pattern in the corpus will yield search results. The default displays show you for each attestated example how the elements defined in the string search are distributed in the structure of that example. Examining the displays is one way to familiarize yourself with the segmentation practices with regard to morphemes in the corpus and with specifics of the structure of the corpus.

3 Basic structure

This corpus is primarily a description of texts. The texts are divided into basic units (primarily sentences and sentence fragments). A basic unit is organized on the principle of a tree in which a trunk branches into smaller parts, ending with leaves (the words of the text). Alternatively you can think of a big box with smaller boxes inside, and the smallest boxes contain blocks (the words of the text). Both ways of visualizing are useful. As a mathematical model, both visualizations amount to the same thing, but the simplest way to express a tree structure in electronic text is to use labeled brackets (the boxes within boxes idea), while in writing about tree structures we use the tree metaphor most often, together with some kinship terms (as might be used in a genealogical tree): ancestor::descendant; sibling::sibling; parent::child, etc. The only confusing thing about the tree metaphor is that linguistic trees are written upside down, with the leaves (words) at the bottom.

3.1 From leaf to root

First let's consider how trees can get built from the bottom up. This discussion is meant to give the user an idea of how a sentence might be analyzed. As an example, (7) is a sentence from the corpus.

(7): 鈴木さんの言葉はすがすがしくさえあった

As a piece of data this sentence forms a string of characters. It is possible to search for the whole sentence using the String search interface on-line. As this sentence is unique in the corpus, a properly formed search will yield only one entry in the result. A poorly formed search will yield nothing. Again, at the moment, the sentence above is only a string of characters, while sentences in the corpus have been parsed into segments and organized under a lattice of nodes. In order for String search to find a reflex in the corpus, it has to match strings to segments in some way or other. One way is to conduct a String search with either “Liberal” or “Character” selected. Either of these will regard each point between characters as a potential segment (word) boundary.

(8): 鈴||木||さ||ん||の||言||葉||は||す||が||す||が||し||く||さ||え||あ||っ||た

This will return a result of one entry, containing one attestation of each of the segments in the sentence. Another way is to segment the string exactly as it is segmented in the corpus, and then conduct a String search with “Mine” or “Strict” selected as a condition:

(9): 鈴木さん<wb>の<wb>言葉<wb>は<wb>すがすがしく<wb>さえ<wb>あっ<wb>た

This also will return a result of one entry.

While the data in the corpus is segmented in a specific way, the segmentation should be fairly easy for a user to predict, as it is mostly based on school grammar: The main parts of speech in school grammar (noun, verb, adjective, adverb, pre-noun, particle, auxiliary, etc.) are set apart as separate segments (words) and each is dominated by a pre-terminal node that is labeled with the name of the part-of-speech for that word.

(10)

NPR	P-ROLE	N	P-OPTR	ADJI	P-OPTR	VB2	AXD
鈴木さん	の	言葉	は	すがすがしく	さえ	あっ	た

NPR (“noun-proper”) is a subcategory of nouns. The pre-terminal node contains the segment, and this can be expressed in labelled bracket notation in the following way:

(11)

(NPR 鈴木さん)

P-ROLE is a sub-category of particles that express grammatical roles:

(12)

(P-ROLE の)

N is the label for common nouns:

(13)

(N 言葉)

P-OPTR is a subcategory of particles, including what are known as toritate particles:

(14)

(P-OPTR は)

ADJI is the category of い-adjectives, excluding auxiliary usages such as -たい, -がたい, -やすい, -にくい, and -づらい. While the ADJI すがすがしく in the example above is in the Infinitive inflection, this information is not included in the pre-terminal node label:

(15)

(ADJI すがすがしく)

P-OPTR appears after inflected forms as well:

(16)

(P-OPTR さえ)

VB2 is a sub-category of verb that follows a core predicate and carries out one or more grammatical functions. VB2s are usually derived from full-fledged verbs, but lose much of their lexical meaning when they appear as VB2s. Here a form of the verb ある appears as a ‘dummy’ to carry inflectional information for the preceding core predicate. Note that the actual form of the verb is a combining stem:

(17)

(VB2 あっ)

AXD is the category of auxiliaries for past-tense morphology:

(18)

(AXD た)

With the parts of speech established for each word, it is possible to refer to the immediate context in which words appear and infer what kind of structures they occupy. We can try this for the sentence above, segment by segment, working from left to right. The basic principle for building structure is that heads project phrases (constituents) and combine with other constituents under those phrases. Accordingly, NPR (as a sub-category of noun) will project an NP (noun phrase):

(19)

(NP (NPR 鈴木さん))

This NP is immediately followed by a genitive particle P-ROLE, which is a head projecting a PP. To put it another way, this NP modifies the following P-ROLE as its complement:

(20)

(PP (NP (NPR 鈴木さん))
(P-ROLE の))

A genitive particle immediately preceding a noun typically marks the complement of that noun. Accordingly the PP is inside of the NP projected by the N which it precedes:

(21)

(NP (PP (NP (NPR 鈴木さん))
(P-ROLE の))
(N 言葉))

Again, a following particle P takes the NP as complement, projecting a PP containing that NP:

(22)

(PP (NP (PP (NP (NPR 鈴木さん))
            (P-ROLE の))
        (N 言葉))
    (P-OPTR は))

Noun phrases marked with focus particles typically relate to predicates (according to some grammatical role or other). The segment すがすがしく immediately following は has the part-of-speech of い-adjective (ADJI). This is one of the three basic types of expressions that form core parts of predicates: verbs, adjectives, and copular expressions. In general, “predicate” is an inflecting category that can project clauses. Predicates in Japanese can be formed by a single segment, but they are frequently formed of a string of segments of different types. For now let's assume that すがすがしく projects a clause that contains the immediately preceding PP. Clauses are labelled IP (inflectional phrase).

(23)

(IP (PP (NP (PP (NP (NPR 鈴木さん))
                (P-ROLE の))
            (N 言葉))
        (P-OPTR は))
    (ADJI すがすがしく))

The remaining elements さえ、, あっ, and た can be seen as parts of a predicate that has an ADJI as its core. Specifically, a past tense adjectival predicate すがすがしかった has been broken up so that the core part of it can receive focus from the P-OPTR さえ. The other parts of the sequence (the verbal syntagm) appear at the same level as the core part.

(24)

(IP (PP (NP (PP (NP (NPR 鈴木さん))
                (P-ROLE の))
            (N 言葉))
        (P-OPTR は))
    (ADJI すがすがしく)
    (P-OPTR さえ)
    (VB2 あっ)
    (AXD た))

The resulting structure is a basic constituency tree for a sentence, but when necessary the corpus adds more information about grammatical function in the form of extensions to the node labels. For example, the PP 鈴木さんの言葉は is the subject of the ADJI すがすがしく, but the particle は doesn't specify that grammatical function, so more information needs to be added: The subject constituent is labelled PP-SBJ. Constituents take on grammatical functions when they combine with other elements, so the specification of function is made at the phrase level where combination of constituents takes place. The subject constituent combines with the predicate under the phrase that the predicate projects, so that is the level under which the labelling for function is added. By contrast, particle の inside of an NP has a function with a limited set of interpretations (possessor::possessed (言葉の音節); whole::part (言葉の殆ど); argument::predicate (言葉の解釈), etc.), and as such doesn't need an extension in that context. Finally, IPs always get an extension. The sentence above is a simple declarative utterance ending in a predicate. We label such sentences IP-MAT (inflectional phrase-matrix).

(25)

(IP-MAT (PP-SBJ (NP (PP (NP (NPR 鈴木さん))
                        (P-ROLE の))
                    (N 言葉))
                (P-OPTR は))
        (ADJI すがすがしく)
        (P-OPTR さえ)
        (VB2 あっ)
        (AXD た))

The above has been an example of the generation of structure for a relatively simple sentence, but the same basic principles of modification and argumenthood can be applied recursively to build very complex structures.

3.2 From root to leaf

Now let's consider a sentence from the top down, starting with the root. Each sentence (or sentence fragment) has a root node, which is in turn divided into smaller nodes (normally other complex phrases). Sentence fragments (typically truncated utterances that lack a predicate) have a root node labeled FRAG. Full sentences are labeled depending on the type of utterance they represent. A question is labeled CP-QUE, an exclamation is labeled CP-EXL, and a statement followed by a non-question sentence-final particle or an afterthought is labeled CP-FINAL. Each of these nodes typically contain an IP-SUB node plus some following element (a sentence-final particle or an afterthought). The remaining root node labels (for sentences that don't involve additional elements marking utterance types) are for statements (IP-MAT) and imperatives (IP-IMP).

All full utterances have an IP (inflectional phrase) clause node of some type as a main clause, but there can be other clauses embedded inside these. Subordinate clauses have the label IP-ADV-SCON. Such clauses typically modify predications specifying either manner or condition. Non-final coordinate clauses have the label IP-ADV-CONJ. Within a noun phrase, clauses directly modifying nouns are labeled either IP-REL or IP-EMB. With certain exceptions, an IP node directly contains a predicate of some sort as a head. Predicates are roughly defined as inflecting expressions that say something about a subject. A common predicate type is that of verbs (labeled VB). Directly under (that is “directly dominated by”) IP you will often find a node indicating a part-of-speech of the core predicate (e.g., VB) and under the part-of-speech label is the word itself (that is, a segment of text). The word is a leaf node. Incidentally, the other core predicates that head IPs are ADJI (い-adjective), ADJN (な-adjective), and [NP-PRD + AX] (nominal predicate plus copula).

There is an inventory of phrase types that appear as components under IP. Of course, IP-ADV can appear recursively under IP, as suggested above. The other principle phrase types are NP (bare noun phrase); PP (particle phrase); ADVP (adverbial phrase); INTJP (interjectional phrase); FRAG (fragment); PU (punctuation). Excepting PU, any of these phrases may be complex, containing other phrases in addition to a head.

When using brackets to describe a given structure, we may say that “node A directly contains node B” or that “node B is directly contained by node A” but we can describe the same relation as “node A directly dominates node B”, or “node B is directly dominated by node A”. In a tree, such a relationship is represented by a branch extending from a (higher) node A close to the root to a (lower) node B farther from the root. Trees are connected structures. That is, every element in a tree is related to every other element through branches. Branches express two independent relations: Precedence and direct dominance. These two relations can be combined to define other generalized relations in a tree (e.g., indirect dominance, immediately following sibling, dominated by root, etc.). So you can specify relations between nodes to create search expressions. If, for example, you wanted to find all the questions that are headed by a verb in the form of the combining stem 食べ, you could look for a CP-QUE that directly dominates an IP-SUB that directly dominates a VB that directly dominates the node 食べ.

Example search patterns

1 Passives

1.1 Direct Passives

Pattern (1) finds all occurrences of direct passives in the corpus. This looks for cases in which a passive auxiliary verb and a logical subject as its preceding sister appear. The てある–construction is excluded, in which a PASS dominates *.

(1): /LGS/ < __ $.. (/PASS\b/ !< /^\*/)

Pattern (2) finds all occurrences of direct passives with omitted logical subjects.

(2): /LGS/ < /^\*/ $.. (/PASS\b/ !< /^\*/)

Pattern (3) finds all occurrences of direct passives with explicit logical subjects.

(3): /LGS/ !< /^\*/ $.. (/PASS\b/ !< /^\*/)

Pattern (4) finds all occurrences of direct passives with logical subjects marked by によって.

(4): /PP-LGS/ < (/P-ROLE/ < /によって/) $.. (/PASS\b/ !< /^\*/)

1.2 Indirect Passives

Pattern (5) finds all occurrences of indirect passives in the corpus.

(5): /LGS/ < __ $.. /PASS2\b/

Pattern (6) finds all occurrences of indirect passives with omitted logical subjects.

(6): /LGS/ < /^\*/ $.. /PASS2\b/

2 Causatives

Pattern (7) finds all occurrences of causatives in the corpus.

(7): /CZZ/ < __ $.. (/VB2/ < /^させ|^せ/)

Pattern (8) finds all occurrences of causativised transitive verbs.

(8): /OB1/ $.. (/VB2/ < /^させ|^せ/)

Pattern (9) finds all occurrences of causativised intransitive verbs.

(9): /VB2/ < /^させ|^せ/ !$,, /OB1/

Pattern (10) finds all occurrences of causatives with a causee marked by を.

(10): /P-CZZ/ < (/P-ROLE/ < /を/) $.. (/VB2/ < /^させ|^せ/)

Pattern (11) finds all occurrences of passivised causatives.

(11): /VB2/ < /^させ|^せ/ $. /PASS/

3 Secondary Verbs あげる/くれる/もらう

Pattern (12) finds all occurrences of sentences with a secondary verb くれる whose beneficiary is marked by に.

(12): /VB2/ < /^くれ/ $,, (/PP-OB2/ < (/P-ROLE/ < に))

Pattern (13) finds all occurrences of causatives immediately followed by a secondary verb もらう.

(13): /IP-SMC-OB1/ < (/VB2/ < /^させ|^せ/) $.. (/^VB\b/ < /^もら/)

4 Tense

Pattern (14) finds all occurrences of complex sentences in which both the matrix and subordinate clauses are tense marked by た.

(14): /SCON|CND/ < (/IP-ADV/ < /AXD/) $.. /AXD/

5 Interrogatives

Pattern (15) finds all occurrences of interrogatives with WH-words (embedded interrogative clauses are excluded).

(15): /CP-QUE/ << /^W/ !> /^CP|^IP/

*The search below would be one way to find all matrix interrogatives irrespective of how (all files, or range of files) you are searching.

(16): /^CP-QUE/ [ > TOP | $ ID ]

Pattern (17) finds all occurrences of interrogatives with a formal noun の/ん.

(17): /CP-QUE/ < (/IP-SUB/ < (/^FN/ < /の|ん/)) !> /^CP|^IP/

Pattern (18) finds all occurrences of negative interrogatives.

(18): /CP-QUE/ < (/IP-SUB/ < /^NEG\b/) !> /^CP|^IP/

Pattern (19) finds all occurrences of interrogatives marked by a final particle other than か.

(19): /CP-QUE/ < (/P-FINAL/ !< か) !> /^CP|^IP/

Pattern (20) finds all occurrences of sentences not marked by the final particle か.

(20): /CP-QUE/ !< (/P-FINAL/ < か) !> /^CP|^IP/

6 Double Subject Sentences

Pattern (21) finds all occurrences of double subject sentences in which the first subject is marked by は and the second one is marked by が.

(21): /PP-SBJ\b/ < (/P-OPTR/ < は) $ (/PP-SBJ2/ < (/P-ROLE/ < が))

7 Conjunctions

Pattern (22) finds all occurrences of sentences with conjunctive adverbial clauses headed by て.

(22): /IP-ADV-CONJ/ <- (/P-CONN/ < /^て|で$/)

Pattern (23) finds all occurrences of sentences with two consecutive conjunctive adverbial clauses headed by て.

(23): /IP-ADV-CONJ/ <- (/P-CONN/ < /^て|で$/) < (/IP-ADV-CONJ/ <- (/P-CONN/ < /^て|で$/))

Pattern (24) finds all instances of conjunction with や where there are 4 or more sisters at the conjunct level (note it is possible for sisters at the conjunct level to be punctuation so this doesn't mean you are getting examples with four or more conjuncts.).

(24): P-CONN < や > (CONJP > (__ <4 __))

8 Scrambling

Pattern (25) finds all occurrences of sentences in which a first object which is not a zero form is followed by a non-zero subject.

(25): /OB1/ !< /^\*/ $.. (/SBJ/ !< /^\*/)

Pattern (26) finds all occurrences of sentences in which a first object which is not a zero form and is marked by は is followed by a non-zero subject.

(26): /PP-OB1/ < (/P-OPTR/ < /は/) $.. (/SBJ/ !< /^\*/)

9 Ellipsis

Pattern (27) finds all occurrences of sentences with an elliptical (i.e., zero pronominalised) first object.

(27): /\*/ > (/^NP-OB1/ !< /\*T\*/)

Pattern (28) finds all occurrences of imperative sentences with an omitted subject.

(28): /\*hearer\*/ > (/NP-SBJ/ > (/IP-SUB/ > /CP-IMP/))

Pattern (29) finds all occurrences of sentences which have an explicit subject with a dropped particle (は, が, etc.)

(29): /^NP-SBJ/ !< /\*/ !$. /^P\b/

Pattern (30) finds all occurrences of sentences with an omitted predicate.

(30): /^IP/ !< /VB|ADJ|NP-PRD/ !< (/^AX\b/ < /^だ|^で/)

10 Noun Clauses

Pattern (31) finds all occurrences of noun clauses headed by the noun こと.

(31): こと > (/^N\b/ $, /^IP-EMB\b/)

*There are a few occurrences of こと which are tagged with FN. An investigation is still needed on the difference between the tags N and FN,

Pattern (32) finds all occurrences of noun clauses introduced by ということ.

(32): こと > (/^N\b/ $, (/^CP-THT\b/ < (/^P-COMP\b/ < という)))

Pattern (33) finds all occurrences of subject noun clauses headed by the noun の.

(33): の > (/^N\b/ $, /^IP-EMB\b/ > (/NP/ > /PP-SBJ/))

*For IPs which appear in a nominal environment without any accompanying nouns or nominalizers, see 10.2 Nominalized IPs (IP-NMZ) of the manual. See also how targetless relative clauses are treated in this section.

Pattern (34) finds all occurrences of pseudo-cleft constructions … のは (が) … だ.

(34): /PP-SBJ/ < (/^NP/ < (/^N/ < の)) $.. /^NP-PRD/

Pattern (35) finds all occurrences of sentences whose first object is a question clause introduced by the final particle か.

(35): /CP-QUE-OB1/ < (/P-FINAL/ < か)

Pattern (36) finds all occurrences of sentences whose subject is a question clause with a WH word.

(36): /CP-QUE-SBJ/ << /^W/

Pattern (37) finds all occurrences of sentences in which a complementiser clause, being headed by the complementiser と, is the first object of a verb for communication or recognition.

(37): /CP-THT-OB1/ < (/^P-COMP/ < と)

11 Adverbial Clause

Pattern (38) finds all occurrences of sentences with conditional clauses headed by the conjugational particle ば.

(38): /IP-ADV-SCON-CND/ < (/P-CONN/ < ば)

Pattern (39) finds all occurrences of sentences with subordinate clauses headed by the conjugational particle から.

(39): /PP-SCON/ < (/P-CONN/ < から)

12 Relative Clauses

Pattern (40) finds all occurrences of cases in which the first object is relativised (i.e., where the head noun plays the role of the first object in the relative clause).

(40): /^N\b/ $,, (/IP-REL/ < (/NP-OB1/ < /\*T\*/))

Pattern (41) finds all occurrences of cases in which a relative clause is not immediately followed by a head noun, but one or more words intervene between them.

(41): /^N\b/ $,, (__ $, /IP-REL/)

Pattern (42) finds all occurrences of cases in which a head noun plays an optional grammatical role (a grammatical role other than the subject, first object, or second object) in the relative clause.

(42): /^IP-REL/ !< (/NP-(SBJ|SBJ2|LGS|DSBJ|OB1|DOB1|OB2|CZZ)\b/ < /\*T\*/)

Pattern (43) finds all occurrences of relativisation as long distance dependency, i.e., relative clauses embedding a clause containing a trace. Cases are excluded in which the embedded clause is adjectival.

(43): /^IP-REL\b/ << (/^IP/ < (/^NP/ < /\*T\*/) !< /ADJ/)

*The head noun modified by the relative clause may possibly not be coreferential with the trace. That is, the embedded clause may be a relative clause that is paired with a head noun coreferential with the trace.

Pattern (44) finds all occurrences of relative clauses without a trace headed by わけ which is not a formal noun, but a common noun.

(44): わけ > (/^N\b/ $,, /^IP-EMB\b/)