Nlp Linguistics Basics
## Linguistics Basics\\\\n\\\\nNatural Language Processing (NLP), as an important branch of artificial intelligence, aims to enable computers to understand, interpret, and generate human language. To master NLP, one must first understand the fundamental principles that govern human language - this is the foundation of linguistics.\\\\n\\\\nLinguistics provides NLP with theoretical frameworks and analytical tools, mainly including the following key aspects:\\\\n\\\\nDo you know what time it is?. **Phonetics**: Studies the sound system of language\\\\nwhat. **Phonology**: Studies how sounds combine to form meaningful linguistic units\\\\nmale-female. **Morphology**: Studies the internal structure of words\\\\nbank. **Syntax**: Studies sentence structure\\\\nas mentioned above. **Semantics**: Studies the meaning of language\\\\nhappy-glad. **Pragmatics**: Studies language use in context\\\\n\\\\n\\\\n\\\\n* * *\\\\n\\\\n## Phonetics & Phonology\\\\n\\\\n### Phonetics\\\\n\\\\nPhonetics studies the physical properties and production mechanisms of speech sounds, focusing on the acoustic and physiological characteristics of speech.\\\\n\\\\n**Articulatory Organs and Manners of Articulation**\\\\n\\\\n* **Articulatory Organs**:\\\\n * Lungs: Provide airflow\\\\n * Larynx: Vocal cord vibration produces voiced sounds\\\\n * Oral cavity: Tongue, teeth, and lips regulate airflow\\\\n * Nasal cavity: Produces nasal sounds\\\\n\\\\n**Consonant Classification**\\\\n\\\\nClassified by manner and place of articulation:\\\\n\\\\n* **Manner of Articulation**:\\\\n\\\\n * Stops: /p/, /b/, /t/, /d/, /k/, /g/\\\\n * Fricatives: /f/, /v/, /s/, /z/\\\\n * Affricates: /ts/, /tΚ/\\\\n * Nasals: /m/, /n/, /Ε/\\\\n * Laterals: /l/\\\\n * Trills: /r/\\\\n\\\\n* **Place of Articulation**:\\\\n\\\\n * Bilabial: /p/, /b/, /m/\\\\n * Labiodental: /f/, /v/\\\\n * Alveolar: /t/, /d/, /n/\\\\n * Velar: /k/, /g/, /Ε/\\\\n\\\\n**Vowel Classification**\\\\n\\\\nClassified by tongue position and lip shape:\\\\n\\\\n* **Tongue Height**: High vowels /i/, /u/, mid vowels /e/, /o/, low vowels /a/\\\\n* **Tongue Position**: Front vowels /i/, /e/, central vowels /a/, back vowels /u/, /o/\\\\n* **Lip Rounding**: Rounded /u/, /o/, unrounded /i/, /e/, /a/\\\\n\\\\n**Characteristics of Chinese Phonetics**\\\\n\\\\n* **Tonal Language**: Tone has distinctive meaning\\\\n * Four basic tones in Mandarin: level, rising, falling-rising, falling\\\\n * Example: Mother(mΔ) Hemp(mΓ‘) Horse(mΗ) scold(mΓ )\\\\n\\\\n* **Syllable Structure**: Initial + Final structure\\\\n * Initials: whatDo you know what time it is? basic initials\\\\n * Finals: male-femalefriend basic finals\\\\n\\\\n### Phonology\\\\n\\\\nPhonology studies the structure and patterns of sound systems, focusing on the function of sounds in a particular language.\\\\n\\\\n**Phoneme**\\\\n\\\\n* The smallest unit of sound that can distinguish meaning\\\\n* Example: In English, /p/ and /b/ are different phonemes (pit vs bit)\\\\n* In Chinese, tone is a component of the phoneme\\\\n\\\\n**Allophone**\\\\n\\\\n* The phonetic realization of a phoneme in different environments\\\\n* Example: In English, the degree of aspiration of /p/ varies in different positions\\\\n\\\\n**Phonological Rules**\\\\n\\\\n* Describe the patterns of sound change in specific environments\\\\n* Example: The tone sandhi rules of "One" in Chinese\\\\n* English plural phonological changes: cats /s/, dogs /z/, horses /Ιͺz/\\\\n\\\\n**Applications in NLP**\\\\n\\\\n* **Speech Recognition**: Converting acoustic signals to text\\\\n* **Speech Synthesis**: Converting text to speech\\\\n* **Pinyin-to-Character Conversion**: Converting pinyin to Chinese characters\\\\n* **Prosodic Analysis**: Computational analysis of poetic meter\\\\n\\\\n* * *\\\\n\\\\n## Morphology\\\\n\\\\nMorphology studies the internal structure of words and word formation patterns, forming the foundation of lexical analysis.\\\\n\\\\n### Basic Concepts\\\\n\\\\n**Morpheme**\\\\n\\\\n* The smallest meaningful grammatical unit in language\\\\n* **Free morpheme**: Can be used independently, such as "Book", "run"\\\\n* **Bound morpheme**: Must attach to other morphemes, such as prefixes and suffixes\\\\n\\\\n**Root, Affix, and Stem**\\\\n\\\\n* **Root**: The core bearer of word meaning\\\\n * Example: In "unhappiness", "happy" is the root\\\\n\\\\n* **Affix**:\\\\n * Prefix: un-, re-, pre-\\\\n * Suffix: -ness, -tion, -ly\\\\n * Infix: Rare, such as in Tagalog\\\\n\\\\n* **Stem**: The form after removing inflectional affixes\\\\n\\\\n### Word Formation Methods\\\\n\\\\n**Derivation**\\\\n\\\\n* Changing word class or meaning by adding derivational affixes\\\\n* Examples:\\\\n * happy β unhappy (adding negative prefix)\\\\n * happy β happiness (nominalization suffix)\\\\n * teach β teacher (agent suffix)\\\\n\\\\n**Compounding**\\\\n\\\\n* Combining two or more roots to form a new word\\\\n* English examples:\\\\n * blackboard (black + board)\\\\n * laptop (lap + top)\\\\n\\\\n* Chinese examples:\\\\n * Computer (electricity + brain)\\\\n * mobile phone (Hand + Machine)\\\\n\\\\n**Inflection**\\\\n\\\\n* Changing the grammatical form of a word without changing its basic meaning\\\\n* English verb conjugation: walk, walks, walked, walking\\\\n* Noun plurals: book β books, child β children\\\\n* Chinese has relatively few inflectional changes\\\\n\\\\n### Characteristics of Chinese Morphology\\\\n\\\\n**Concept of Word**\\\\n\\\\n* Word boundaries in Chinese are relatively vague\\\\n* The boundaries between character, word, and phrase are not as clear as in English\\\\n* Example: "Master's Degree" can be one word, or analyzed as "Research"+"life"\\\\n\\\\n**Word Formation Methods**\\\\n\\\\n* **Compounding as the main method**:\\\\n\\\\n * Modifier-head: firecar (fire+car)\\\\n * Verb-object: Driver (manage + machine)\\\\n * Subject-predicate: adverbial particle 'de'Shake (adverbial particle 'de'+Shake)\\\\n * Coordinate: Friend (Friend (morpheme)+friend)\\\\n\\\\n* **Reduplication**:\\\\n\\\\n * Verb reduplication: SeeSee, Take a walk\\\\n * Adjective reduplication: red red's, Slowly\\\\n * Noun reduplication: PersonPerson, everything\\\\n\\\\n**Conversion**\\\\n\\\\n* The same character or word can serve different parts of speech\\\\n* Example: "Water" can be a noun (drink water) or verb (paddy field)\\\\n\\\\n### Applications in NLP\\\\n\\\\n**Stemming**\\\\n\\\\n* Reducing words to their stem form\\\\n* Porter algorithm: Reducing "running", "runs", "ran" all to "run"\\\\n\\\\n**Lemmatization**\\\\n\\\\n* Reducing words to their dictionary form (lemma)\\\\n* More accurate by considering part-of-speech information\\\\n\\\\n**Chinese Word Segmentation**\\\\n\\\\n* Since Chinese words are not separated by spaces, segmentation is required\\\\n* Methods based on dictionary, statistics, or neural networks\\\\n\\\\n**Part-of-Speech Tagging**\\\\n\\\\n* Determining the grammatical category of each word\\\\n* Providing basic information for syntactic analysis\\\\n\\\\n* * *\\\\n\\\\n## Syntax\\\\n\\\\nSyntax studies the structure and organization patterns of sentences, forming the core of understanding language grammar.\\\\n\\\\n### Basic Concepts\\\\n\\\\n**Phrase Structure**\\\\n\\\\n* **Noun Phrase (NP)**: Phrase with noun as head\\\\n * Example: thatbookHave/There isinteresting'sBook\\\\n\\\\n* **Verb Phrase (VP)**: Phrase with verb as head\\\\n * Example: fast adverbial particle 'de'running\\\\n\\\\n* **Prepositional Phrase (PP)**: Phrase with preposition as head\\\\n * Example: On the table\\\\n\\\\n* **Adjective Phrase (AP)**: Phrase with adjective as head\\\\n * Example: very beautiful\\\\n\\\\n**Sentence Constituents**\\\\n\\\\n* **Subject**: The doer of the action\\\\n* **Predicate**: Describes the action or state of the subject\\\\n* **Object**: The receiver of the action\\\\n* **Attributive**: Modifies the noun\\\\n* **Adverbial**: Modifies the verb or adjective\\\\n* **Complement**: Provides supplementary explanation\\\\n\\\\n### Syntactic Analysis Methods\\\\n\\\\n**Phrase Structure Grammar**\\\\n\\\\n* Describes sentence structure using rewrite rules\\\\n* Example: S β NP VP NP β Det N VP β V NP Det β the, a, an N β cat, dog, book V β chase, read\\\\n\\\\n**Dependency Grammar**\\\\n\\\\n* Focuses on dependency relationships between words\\\\n* Each word depends on a head word (except the root node)\\\\n* Example: In "SmallCat chases mouse":\\\\n * "Chase" is the root node\\\\n * "SmallCat" depends on "Chase" (subject-predicate relation)\\\\n * "Mouse" depends on "Chase" (verb-object relation)\\\\n * "Small" depends on "Cat" (modifier-head relation)\\\\n\\\\n**Tree Representation**\\\\n\\\\n Chase / SmallCat Mouse / Small\\\\n### Characteristics of Chinese Syntax\\\\n\\\\n**Word Order Features**\\\\n\\\\n* Basic word order: Subject-Verb-Object (SVO)\\\\n* Modifiers precede modified elements: 's-constructions\\\\n* Example: that person wearing redClothes'sBeautiful girl\\\\n\\\\n**Special Structures**\\\\n\\\\n* **Change-construction**: Change + object + verb\\\\n * Example: IChangeBookPutOn the table\\\\n\\\\n* **Bei-construction**: subject + Passive marker + agent + verb\\\\n * Example: BookPassive markerIPutOn the table\\\\n\\\\n* **Existential sentence**: Expresses existence or appearance\\\\n * Example: Put on the table one book\\\\n\\\\n**Degree of Grammaticalization**\\\\n\\\\n* Chinese has relatively low degree of grammaticalization\\\\n* Word order and context play important roles in expressing grammatical relations\\\\n* Lacks rich morphological changes\\\\n\\\\n### Challenges in Syntactic Analysis\\\\n\\\\n**Ambiguity**\\\\n\\\\n* **Structural ambiguity**: A sentence can have multiple syntactic analyses\\\\n* Example: "ISeeseeParticle (completion)takeprogressive/continuous aspect markertelescope'sPerson"\\\\n * Analysis Do you know what time it is?: I saw the person using a telescope\\\\n * Analysis what: I saw a person who was holding a telescope\\\\n\\\\n**Long-distance Dependencies**\\\\n\\\\n* Dependencies between sentence constituents may span large distances\\\\n* Example: In the question "WhatBookYouyesterdaybuyParticle (completion)οΌ", "What" has a dependency relation with "buy"\\\\n\\\\n**Ellipsis**\\\\n\\\\n* Chinese often omits subjects or other constituents\\\\n* Example: (I)yesterdaySeeParticle (completion)ElectricityshadowοΌ(Electricityshadow)very good to see / nice to see\\\\n\\\\n### Applications in NLP\\\\n\\\\n**Parsers**\\\\n\\\\n* **Rule-based**: Using manually written grammar rules\\\\n* **Statistical methods**: Probability models based on annotated corpora\\\\n* **Deep learning**: End-to-end learning using neural networks\\\\n\\\\n**Treebanks**\\\\n\\\\n* Penn Treebank (English)\\\\n* Chinese Treebank (CTB)\\\\n* Providing training and evaluation data for syntactic analysis\\\\n\\\\n**Application Tasks**\\\\n\\\\n* **Machine Translation**: Understanding source language syntactic structure\\\\n* **Information Extraction**: Extracting information based on syntactic patterns\\\\n* **Question Answering Systems**: Understanding the syntactic structure of questions\\\\n\\\\n* * *\\\\n\\\\n## Semantics\\\\n\\\\nSemantics studies the meaning of language, forming the core of natural language understanding.\\\\n\\\\n### Basic Concepts\\\\n\\\\n**Lexical Semantics**\\\\n\\\\n* **Word meaning**: The concept or meaning expressed by a word\\\\n* **Polysemy**: One word having multiple related meanings\\\\n* **Synonymy**: Different words expressing the same or similar meanings\\\\n* **Antonymy**: Opposite relationships between words\\\\n* **Hyponymy**: Inclusion relationships between concepts\\\\n\\\\n**Semantic Relations**\\\\n\\\\n* **Synonyms**:\\\\n * Complete synonyms: Rare\\\\n * Near-synonyms: Happy-Glad, Da-huge big\\\\n\\\\n* **Antonyms**:\\\\n * Complementary: Die-Alive, Male-Female\\\\n * Gradable: cold-hot, Da-Small\\\\n * Relational: Teacher-Student, Buy-Sell\\\\n\\\\n* **Hyponyms**:\\\\n * Hypernym: Animal, color\\\\n * Hyponym: dog, Cat (hyponyms of Animal)\\\\n\\\\n### Sentence Semantics\\\\n\\\\n**Proposition**\\\\n\\\\n* The basic semantic content expressed by a sentence\\\\n* Example: The proposition expressed by "Xiao Mingin/atlibrarySeeBook":\\\\n * Agent: Xiao Ming\\\\n * Action: See\\\\n * Patient: Book\\\\n * Location: library\\\\n\\\\n**Semantic Roles**\\\\n\\\\n* **Agent**: The doer of the action\\\\n* **Patient**: The receiver of the action\\\\n* **Instrument**: The tool used to complete the action\\\\n* **Location**: Where the action takes place\\\\n* **Time**: When the action occurs\\\\n* **Manner**: How the action is performed\\\\n\\\\n**Argument Structure**\\\\n\\\\n* The semantic participants required by a verb\\\\n* Example: The verb "Give" requires three arguments:\\\\n * Giver (agent)\\\\n * Receiver (recipient)\\\\n * Thing given (patient)\\\\n\\\\n### Semantic Representation\\\\n\\\\n**Logical Representation**\\\\n\\\\n* Using logical formulas to represent meaning\\\\n* First-order logic: βx (Person(x) β§ happy (x))\\\\n* Example: Logical representation of "Have/There isPersonvery happy / glad"\\\\n\\\\n**Frame Semantics**\\\\n\\\\n* Understanding meaning based on cognitive frames\\\\n* FrameNet project: Building frame-based semantic resources\\\\n* Example: The commercial transaction frame includes buyer, seller, goods, price, etc.\\\\n\\\\n**Conceptual Graphs**\\\\n\\\\n* Using graph structures to represent concepts and relations\\\\n* Nodes represent concepts, edges represent relations\\\\n* Suitable for representing complex semantic networks\\\\n\\\\n### Semantic Ambiguity\\\\n\\\\n**Lexical Ambiguity**\\\\n\\\\n* Polysemy: Bank (financial institution/river bank)\\\\n* Homophones: HePlural marker/itPlural marker\\\\n\\\\n**Structural Ambiguity**\\\\n\\\\n* Modifier ambiguity: "beautiful'sFemalechild'sClothes"\\\\n* Scope ambiguity: "that which / whatHave/There isStudylifeall fail"\\\\n\\\\n**Pragmatic Ambiguity**\\\\n\\\\n* Requires context to determine meaning\\\\n* Example: Pronoun reference, recovery of elliptical constituents\\\\n\\\\n### Applications in NLP\\\\n\\\\n**Lexical Semantic Resources**\\\\n\\\\n* **WordNet**: English lexical semantic network\\\\n* **HowNet**: Chinese lexical semantic knowledge base\\\\n* **Tongyici Cilin**: Chinese synonym classification system\\\\n\\\\n**Semantic Analysis Tasks**\\\\n\\\\n* **Word Sense Disambiguation**: Determining the meaning of polysemous words in specific contexts\\\\n* **Semantic Role Labeling**: Identifying semantic roles in sentences\\\\n* **Semantic Similarity Computation**: Calculating semantic similarity between words or sentences\\\\n\\\\n**Application Areas**\\\\n\\\\n* **Question Answering Systems**: Understanding the semantic intent of questions\\\\n* **Machine Translation**: Maintaining semantic consistency in translation\\\\n* **Information Retrieval**: Relevance matching based on semantics\\\\n\\\\n* * *\\\\n\\\\n## Pragmatics\\\\n\\\\nPragmatics studies language use in specific communicative situations, focusing on how context affects meaning.\\\\n\\\\n### Basic Concepts\\\\n\\\\n**Context**\\\\n\\\\n* **Linguistic context**: Contextual information\\\\n* **Situational context**: The specific situation of communication\\\\n* **Cultural context**: Social and cultural background\\\\n\\\\n**Speech Act Theory**\\\\n\\\\n* **Locutionary act**: The act of speaking itself\\\\n* **Illocutionary act**: The purpose intended by speaking\\\\n* **Perlocutionary act**: The effect produced by speaking\\\\n\\\\n**Types of Illocutionary Acts**\\\\n\\\\n* **Assertives**: Stating facts, e.g., "It rained today"\\\\n* **Directives**: Requesting action, e.g., "Please close the door"\\\\n* **Commissives**: Committing to future action, e.g., "ItomorrowCome"\\\\n* **Expressives**: Expressing attitudes, e.g., "Congratulations to you"\\\\n* **Declaratives**: Changing the state of affairs, e.g., "Ideclare the meeting open"\\\\n\\\\n### Pragmatic Phenomena\\\\n\\\\n**Deixis**\\\\n\\\\n* Linguistic elements whose reference depends on context\\\\n* **Person deixis**: I, You, He\\\\n* **Time deixis**: Now, yesterday, tomorrow\\\\n* **Space deixis**: here, There, above\\\\n* **Discourse deixis**: As mentioned above, In summary\\\\n\\\\n**Presupposition**\\\\n\\\\n* Information that the speaker assumes the hearer already knows\\\\n* Example: "Xiao Ming'sWife is beautiful" presupposes that Xiao Ming is married\\\\n\\\\n**Implicature**\\\\n\\\\n* **Conversational implicature**: Implicit meaning generated by violating the Cooperative Principle\\\\n* Example: A: "Youknow what time Particle (completion) question particle?" B: "Know."\\\\n * B's answer violates the Maxim of Quantity, implying unwillingness to reveal the time\\\\n\\\\n**Cooperative Principle**\\\\n\\\\n* **Maxim of Quantity**: Provide an appropriate amount of information\\\\n* **Maxim of Quality**: Tell the truth\\\\n* **Maxim of Relation**: Be relevant\\\\n* **Maxim of Manner**: Be clear\\\\n\\\\n### Characteristics of Chinese Pragmatics\\\\n\\\\n**Politeness Strategies**\\\\n\\\\n* Chinese places great emphasis on the Politeness Principle\\\\n* Use of euphemisms and honorifics\\\\n* Example: May I ask, HempBother you, Sorry\\\\n\\\\n**High-context Culture**\\\\n\\\\n* Relies on context to understand meaning\\\\n* Implicit expression, not direct\\\\n* Example: Chinese refusals are often indirect\\\\n\\\\n**Face Theory**\\\\n\\\\n* Positive face: The need to be approved\\\\n* Negative face: The need not to be disturbed\\\\n* Influences the choice of speech acts\\\\n\\\\n### Applications in NLP\\\\n\\\\n**Dialogue Systems**\\\\n\\\\n* **Intent Recognition**: Understanding the user's true intention\\\\n* **Slot Filling**: Extracting key information from dialogue\\\\n* **Dialogue Management**: Controlling the dialogue flow\\\\n\\\\n**Sentiment Analysis**\\\\n\\\\n* **Implicit Sentiment**: Identifying indirectly expressed emotions\\\\n* **Irony Detection**: Understanding the true meaning of ironic statements\\\\n\\\\n**Machine Translation**\\\\n\\\\n* **Pragmatic Equivalence**: Maintaining the pragmatic function of the source text\\\\n* **Cultural Adaptation**: Considering the cultural characteristics of the target language\\\\n\\\\n* * *\\\\n\\\\n## Characteristics of the Chinese Language\\\\n\\\\nChinese, as a representative of the Sino-Tibetan language family, has unique linguistic characteristics that bring special challenges and opportunities for Chinese NLP.\\\\n\\\\n### Necessity of Word Segmentation\\\\n\\\\n**No Space Delimiters**\\\\n\\\\n* There are no obvious delimiters between Chinese words\\\\n* Example: γIloveBeijingTiananmenγneeds to be segmented intoγI/love/Beijing/Tiananmenγ\\\\n* Unlike natural word segmentation in English and other languages\\\\n\\\\n**Complex Concept of Word**\\\\n\\\\n* The boundaries between character, word, and phrase are vague\\\\n* Example: γresearch' can be a word, and γMaster's Degreeγcan also be a word\\\\n* Context affects segmentation results\\\\n\\\\n### Segmentation Ambiguity\\\\n\\\\n**Combinatorial Ambiguity**\\\\n\\\\n* The same character sequence can have different segmentation methods\\\\n* Example: γmarry'sandNot yet married'sγ\\\\n * Incorrect segmentation: marry's/andnot yet/married's\\\\n * Correct segmentation: marry's/and/stillnot yet/marry's\\\\n\\\\n**Intersection Ambiguity**\\\\n\\\\n* Adjacent segmentation schemes overlap\\\\n* Example: γChangchun City Changchun Pharmacy\\\\n * Scheme Do you know what time it is?: Changchun City/Changchun/pharmacy\\\\n * Scheme what: Changchun / mayor / Chun pharmacy\\\\n\\\\n**True Ambiguity**\\\\n\\\\n* Different segmentation results are all grammatically and semantically valid\\\\n* Example: γping-pong paddles sell finish Particle (completion)
YouTip