Intro
- This page explains how the parser actually works.
1. Complete Summary
2. Dictionaries
2.1. The Core Dictionary
2.2. Extra Names Dictionary
2.3. All French Verbs
2.4. Analysis for Noun and Adjective Endings
2.5. Example of Lexicon Entries
3. Simple Syntactic Rules for a French Parser in Lisp
2. Dictionaries
2.1. The Core Dictionary
2.2. Extra Names Dictionary
2.3. All French Verbs
2.4. Analysis for Noun and Adjective Endings
2.5. Example of Lexicon Entries
3. Simple Syntactic Rules for a French Parser in Lisp
1. Complete Summary
- First there is a short but complete description of the parser (25 pages).
- Read it as: pdf, dvi
- It covers the following topics:
1 The Goal
2 The Dictionary
2.1 FOSS French Dictionary
2.2 Lexique3
2.2.1 Lexique 3.21
2.2.2 Prénoms 1.00
2.3 Perl Script For Adaption to Lisp Structure
2.3.1 Prénoms Perl Script
2.3.2 Lexique Perl Script
2.4 Database Access Functions in Main Module
2.5 Verbs from Verbiste
2.5.1 The Database
2.5.2 Transformation for Lisp
2.6 Database Access Functions for Verbs
2.7 Faster Startup Time with Memory Image
2.8 Suffix Analysis
2.9 Summary: Unknown Words
3 Pretreatment
3.1 French Word Fusions
3.2 The prepare Function
4 Parsing with Rules
4.1 The Syntax Rules
4.2 Meta Rules
4.3 The Core Parsing Algorithm
4.4 Improvements to the Parsing Algorithm
4.5 The Memoization Facilitiy
5 Statistics - How to Kill Ambiguities
5.1 The Idea
5.2 The showRank function
6 Output
6.1 The Goal
6.2 Latex Trees
6.3 System Calls and Perl Picture creation
6.4 The clisp html and http Addon
6.5 The fnal cgi File
7 Installation and Confguration
7.1 CLISP
7.2 Latex
7.3 textogif and png
7.4 Apache Webserver Confguration
7.5 Last steps
8 Improvements for Future Versions
8.1 Regular Expressions in Rules
8.2 Automated Testing
8.3 Machine Learning
9 Credits
2 The Dictionary
2.1 FOSS French Dictionary
2.2 Lexique3
2.2.1 Lexique 3.21
2.2.2 Prénoms 1.00
2.3 Perl Script For Adaption to Lisp Structure
2.3.1 Prénoms Perl Script
2.3.2 Lexique Perl Script
2.4 Database Access Functions in Main Module
2.5 Verbs from Verbiste
2.5.1 The Database
2.5.2 Transformation for Lisp
2.6 Database Access Functions for Verbs
2.7 Faster Startup Time with Memory Image
2.8 Suffix Analysis
2.9 Summary: Unknown Words
3 Pretreatment
3.1 French Word Fusions
3.2 The prepare Function
4 Parsing with Rules
4.1 The Syntax Rules
4.2 Meta Rules
4.3 The Core Parsing Algorithm
4.4 Improvements to the Parsing Algorithm
4.5 The Memoization Facilitiy
5 Statistics - How to Kill Ambiguities
5.1 The Idea
5.2 The showRank function
6 Output
6.1 The Goal
6.2 Latex Trees
6.3 System Calls and Perl Picture creation
6.4 The clisp html and http Addon
6.5 The fnal cgi File
7 Installation and Confguration
7.1 CLISP
7.2 Latex
7.3 textogif and png
7.4 Apache Webserver Confguration
7.5 Last steps
8 Improvements for Future Versions
8.1 Regular Expressions in Rules
8.2 Automated Testing
8.3 Machine Learning
9 Credits
2. Dictionaries
2.1. The Core Dictionary
- The dictionary is based on Lexique, the only freely available French dictionary I found with some core information about each word.
- You can see if a word is in the dictionary.
- Even though the information on verbs could be better (like transitivity) it provides a good basis for a mixture rule based parsing with a statistic based disambiguation.
2.2. Extra Names Dictionary
- I also added 11627 names from lots of different countries with the help of prénoms 1.00 project which is a subproject of Lexique.
2.3. All French Verbs
- In case some rare verbform does not exist in the corpus based lexique dictionary I also added a verb analyzer based on the verb list provided by verbiste.
- It has more than 6000 Verbs in a xml based file with templates that describe the conjugation of each verb. I wrote a little perl script to transfrom this database into a lisp hash.
2.4. Analysis for Noun and Adjective Endings
- with the help of the suffixe list provided by the orbilat project I wrote suffix analysis functions that try to guess if an unknown word is a noun or an adjective.
2.5. Example of Lexicon Entries
- (setf(gethash "soigneux" *dict*)'((ADJ "soigneux" m NIL ((NIL )) 2.48)))
- (setf(gethash "soignez" *dict*)'((VER "soigner" NIL p ((imp pre 2p)(ind pre 2p)()) 3.54)))
- (setf(gethash "soigniez" *dict*)'((VER "soigner" NIL p ((ind imp 2p)(sub pre 2p)()) 0.12)))
- (setf(gethash "soignons" *dict*)'((VER "soigner" NIL p ((ind pre 1p)()) 0.06)))
- (setf(gethash "soignèrent" *dict*)'((VER "soigner" NIL p ((ind pas 3p)()) 0.27)))
- (setf(gethash "soignés" *dict*)'((ADJ "soigné" m p ((NIL )) 1.15)(VER "soigner" m p ((par pas)()) 1.47)))
- (setf(gethash "soi-même" *dict*)'((PRO_per "soi-même" NIL s ((NIL )) 34.47)))
- (setf(gethash "soin" *dict*)'((NOM "soin" m s ((NIL )) 112.94)))
- (setf(gethash "soins" *dict*)'((NOM "soin" m p ((NIL )) 36.51)))
- (setf(gethash "soir" *dict*)'((NOM "soir" m s ((NIL )) 1082.25)))
- (setf(gethash "soirée" *dict*)'((NOM "soirée" f s ((NIL )) 153.61)))
- (setf(gethash "soirées" *dict*)'((NOM "soirée" f p ((NIL )) 25.98)))
- (setf(gethash "soirs" *dict*)'((NOM "soir" m p ((NIL )) 53.12)))
- (setf(gethash "sois" *dict*)'((AUX "être" NIL s ((sub pre 2s)()) 61)(VER "être" NIL s ((imp pre 2s)(sub pre 1s)(sub pre 2s)()) 248.05)))
- (setf(gethash "soissonnais" *dict*)'((ADJ "soissonnais" m NIL ((NIL )) 0.07)(NOM "soissonnais" m NIL ((NIL )) 0.14)))
- (setf(gethash "soit" *dict*)'((AUX "être" NIL s ((sub pre 3s)()) 182.88)(VER "être" NIL s ((sub pre 3s)()) 474.86)(CON "soit" NIL NIL ((NIL )) 98.76)(ADV "soit" NIL NIL ((NIL )) 18.46)))
- (setf(gethash "soixantaine" *dict*)'((NOM "soixantaine" f s ((NIL )) 4.95)))
3. Simple Syntactic Rules for a French Parser in Lisp
- These are the syntax rules used by Synthia.
- Feel free to add or adjust the rules. If your changes make sense, I will upload them into the online French Parser.
- #|
- -- Categories: --
- ADJ Adjectif
- ADJ_dem Adjectif demonstratif: ce cette cet, sont des SDETerminant
- ADJ_ind Adjectif indéfini: chaque, différentes, mainte
- ADJ_num Adjectif numérique
- ADJ_pos Adjectif possessif
- ADV Adverbe
- ART_def Article défini
- ART_inf Article indéfini
- AUX Auxiliaire
- CON Conjonction
- NOM Nom commun
- ONO Onomatopée
- PRE Préposition
- PRO_dem Pronom démonstratif
- PRO_ind Pronom indéfini
- PRO_int Pronom interrogatif
- PRO_per Pronom personnel
- PRO_rel Pronom relation
- VER Verbe
- |#
- (defparameter *grammar*
- '((DET -> ART_def)
- (DET -> ART_inf)
- (DET -> ADJ_dem)
- (DET -> ADJ_pos)
- (DET -> ADJ_ind)
- (DET -> ADJ_num)
- (GDET -> DET)
- (GDET -> DET PRO_ind)
- (SDET -> GDET)
- (SDET -> PRO_ind GDET)
- (TITLE (-> title?) NOM)
- (SN -> TITLE NOM)
- (GN (-> eqN? eqG?) SADJ NOM)
- (GN (-> eqN? eqG?) NOM SADJ)
- (GN (-> eqN? eqG?) SADJ NOM SADJ)
- (GN -> NOM SP)
- (GN -> NOM SP_sn)
- (GN -> NOM)
- (GN -> NOM PHCONJ)
- (GN -> NOM PHREL)
- ; (GN (-> inf) VER) ; conflicting with: (SV_inf (-> inf) VER), perhaps this rule only with DET ?
- (GN (-> inf) DET VER)
- (SN (-> eqN? eqG?) SDET GN)
- (SN -> PRO_per)
- (SN -> PRO_rel)
- (SN -> GN)
- (SN -> ADJ_num)
- (SN -> PRO_ind)
- (SN -> SN virgule SN)
- (SN (-> conjCoo?) SN CON SN)
- (SN (-> conjCoo?) SN virgule SN CON SN)
- (SN -> NOM_propre) ; added from prenom.txt
- (SADJ -> SADV ADJ)
- (SADJ (-> conjCoo?) SADJ CON SADJ)
- (SADJ -> ADJ)
- ; for combined numbers like 'quatre mille'
- (ADJ_numC -> ADJ_num ADJ_num)
- (ADJ_numC -> ADJ_num ADJ_numC)
- (DET -> ADJ_numC)
- (SADJ_ph -> ADJ SP)
- (SADJ_ph -> ADV ADJ SP)
- (SADV (-> notNeg?) ADV)
- (SADV (-> notNeg?) PRE ADV) ; depuis aujourd'hui
- (SADV (-> notNeg?) ADV SP)
- (SADV (-> notNeg?) ADV PHCONJ) ; mieux 'que le dire'
- ; (SADV -> PRE SV_inf) ; PRE is in SP!
- ;------------------------------------
- ; Syntagme Prépositionel
- (GP -> PRE SN)
- (GP -> PRE SV_inf) ; Il faut se coucher pour se reposer.
- (GP_sv -> PRE_sv SV_inf)
- (GP_sv -> PRE_sv SN)
- (GP_sn -> PRE_sn SV_inf)
- (GP_sn -> PRE_sn SN)
- ; (SP (-> NconjCoo?) CON SN) ; pas une SP, mais PHCONJ
- (SP -> ADV GP)
- (SP -> GP)
- (SP_sv -> ADV GP_sv)
- (SP_sv -> GP_sv)
- (SP_sn -> ADV GP_sn)
- (SP_sn -> GP_sn)
- ;------------------------------------
- ;---------------------------
- ; Syntagme Verbal
- ; (SV (-> conjCoo?) SV CON SV)
- (SV -> VER SN)
- (SV -> VER SADV SN)
- (SV -> VER SN SP)
- (SV -> VER SN SP_sv) ; only here SP_sv is important, because of SN before, je donne le livre (à|de) Pierre.
- (SV -> VER SADV SN SP)
- (SV -> VER SADV SN SP_sv) ; only here SP_sv is important, because of SN before, je donne le livre (à|de) Pierre.
- (SV -> VER SP)
- (SV -> VER SP_sv); has to be here otherwise wrong solutions are more likely if together with AUX as with all other rules :/
- (SV -> VER SADV SP)
- (SV -> VER SADV SP_sv)
- (SV -> VER SP SP)
- (SV -> VER SP SP_sv) (SV -> VER SP_sv SP) (SV -> VER SP_sv SP_sv)
- (SV -> VER SADV SP SP)
- (SV -> VER SADJ_ph)
- (SV -> VER SADJ)
- (SV -> VER)
- (SV -> VER SV_inf) ; Ex.: Je veux aller à Sète.
- (SV -> VER PHCONJ) ; veux que phrase
- ;- verb not directly used to compose SV:
- (SV -> AUX SV)
- (SV -> AUX SADV SV)
- (SV (-> Pp=LeLaMeTe?) PRO_per SV)
- (SV (-> conjCoo?) SV CON SV)
- ;---------------------------
- ;---------------------------
- ; Syntagme Verbal with infinite verb, Ex: Je veux 'rester dans le soleil'.
- (SV_inf (-> inf) VER)
- (SV_inf (-> inf) VER SN)
- (SV_inf (-> inf) VER SADV SN)
- (SV_inf (-> inf) PRO_per SV_inf)
- (SV_inf (-> inf) VER SN SP)
- (SV_inf (-> inf) VER SN SP_sv)
- (SV_inf (-> inf) VER SP)
- (SV_inf (-> inf) VER SP_sv)
- (SV_inf (-> inf) VER SP SP)
- (SV_inf (-> inf) VER SP_sv SP)
- (SV_inf (-> inf) VER SP SP_sv)
- (SV_inf (-> inf) VER SP_sv SP_sv)
- (SV_inf (-> inf) VER SADJ_ph)
- (SV_inf (-> inf) VER SADJ)
- ;(SV_inf (-> inf) VER SV) ;test if this rule is ok for avoir lu (INF + SV)
- ;---------------------------
- ;---------------------------
- ; Negation
- (VER (-> ADV=neg?) ADV VER ADV)
- (AUX (-> ADV=neg?) ADV AUX ADV)
- (SV (-> ADV=neg?) ADV PRO_per SV ADV)
- ; non negative ne
- (VER (-> ADV=ne?) ADV VER)
- (AUX (-> ADV=ne?) ADV AUX)
- (SV (-> ADV=ne?) ADV PRO_per SV)
- ;---------------------------
- ;---------------------------
- ; Ex: Je veux 'que nous allons'. SN: le fait 'que il travail' SADV: mieux que le dire
- (PHCONJ (-> NconjCoo?) CON PH_sub)
- (PHCONJ (-> NconjCoo?) CON SN)
- ;---------------------------
- ;---------------------------
- ; Phrase Relative
- (PHREL -> PRO_rel SV)
- (PHREL -> PRO_rel SV SADV)
- (PHREL (-> conjCoo?) PRO_rel SV CON SV)
- (PHREL (-> conjCoo?) PRO_rel SV SADV CON SV)
- ; (PHREL (-> conjCoo?) PRO_rel SV SADV CON SV)
- (PH_sub (-> eqN?) SN SV)
- (PH_sub (-> eqN?) SN SV SADV)
- (PH_sub (-> conjCoo?) PH_sub CON PH_sub)
- (PH_sub (-> conjCoo?) PH_sub virgule PH_sub)
- (PH_sub -> PRE SV_inf);après avoir lu le journal, il ...
- (PH (-> eqN?) SN SV POINT)
- (PH -> SN SV SADV POINT)
- (PH -> SADV SN SV POINT)
- (PH (-> conjCoo?) PH_sub CON PH_sub POINT)
- (PH (-> conjCoo?) PH_sub virgule PH_sub POINT)
- (PH -> CON PH) ; Ex: Mais je veux aller! or add CON for each rule?
- (PH_imp -> SV POINT) ; Imperative Phrases, Ex: Supprime ce dernier fichier.
- ;---------------------------
- ; Questions with Inversion: Coppied from normal SV-rules, but VER and SN changed
- (SV_Q (-> conjCoo?) SV CON SV)
- (SV_Q -> SN SN)
- (SV_Q -> SN SADV SN)
- (SV_Q -> SN SN SP) (SV_Q -> SN SN SP_sv)
- (SV_Q -> SN SP) (SV_Q -> SN SP_sv)
- (SV_Q -> SN SP SP) (SV_Q -> SN SP_sv SP) (SV_Q -> SN SP SP_sv) (SV_Q -> SN SP_sv SP_sv)
- (SV_Q -> SN SADJ_ph)
- (SV_Q -> SN SADJ)
- (SV_Q -> SN)
- (SV_Q -> SN SV_inf) ; Ex.: veux-tu aller à Sète?
- (SV_Q -> SN PHCONJ) ; veux que phrase
- (SV_Q -> SN SV); dangerous, but needed for: A-t-il trouvé son sac?
- (PH_Q (-> eqN?) SV SV_Q PointInterrogation)
- (PH_Q -> SV SV_Q SADV PointInterrogation)
- (PH_Q (->eqN?) AUX SV_Q PointInterrogation)
- (PH_Q (->eqN?) AUX SV_Q SADV PointInterrogation)
- (PH_Q -> SADV SV SV_Q PointInterrogation)
- ;(Question (-> est-ceQue?) ADV SN SV PointInterrogation) ; for all other cases too... not needed
- ;---------------------------
- ;-------------------
- ; Indirect Speech
- ;------------------
- (PH -> PH_sub 2points D_cit PH_sub F_cit POINT)
- (PH -> PH_sub 2points SV POINT)
- ;---------------------------
- (PH -> ONO POINT) ; Ah!
- ))