"grammar mining"

This is a repository for things, which are related to the systematic extraction of information from grammatical descriptions. This means linguistic descriptions of languages are the target; not to be confused with the computational term "grammar mining".

Workflow Documentation

A couple of text (markdown) files describe the workflow used in the project. The basic outline is described in workflow.md. Specific steps of the workflow are explained in more depth in the files Creating_bibliographies.md, duplicate_reduction.md and Acquire_sources.md.

Defining the scope

The R script SA_scope_languoids.R is used for the filtering of languoids used in the sample (using input from Glottolog), cf. workflow.md.

Link extraction

As input for the download script, we need a list of filenames (stored as csv). This list is extracted from a bib file (cf. Creating_bibliographies.md) using filename_extraction_glottolog_v3.R.

One version of the output of the filename extraction can be found here as well (pigritiae causa vel for reasons of convenience): SA_alldoctypes_bestfn.csv. Note that this is not necessarily the output of the current version of the script and it doesn't claim to be; it's only there for testing the script below without having to create a file yourself.

Download script

THe file downloading_script_v2.2.4_flexible.py is a python3 script, which takes as input a csv file (cf. above) and downloads the specified items from the (not publically accessible!) website gramfinder. If you want to customize the script: The input file needs to specify the latter part of the URL (i.e. the folder), in the format "/macro-area/author_keywordYear(_suffix).pdf" like "eurasia\zydenbos_kannada2011_o.pdf" (can be found in the bib-files from Glottolog). (Note: this is my first python script; it's rather basic, and you may receive a warning about the security of the session - however, as I have been told by the owner of the website, this is at current unavoidable. You may, of course, report any issues via an issue, DM or pull request.)

Note: In case that there is really someone out there who wants to make use of those scripts and does not only see them as the documentation of the scientific work of our project, let me say that a) there is some information on how to use the scripts / what to be aware of in the scripts themselves (comment lines), and feel free to ask me or point me towards any inconsistencies, problems etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"grammar mining"

Workflow Documentation

Defining the scope

Link extraction

Download script

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Bibliography_290423		Bibliography_290423
Downloading		Downloading
Acquire_sources.md		Acquire_sources.md
Creating_bibliographies.md		Creating_bibliographies.md
README.md		README.md
SA_alldoctypes_bestfn.csv		SA_alldoctypes_bestfn.csv
SA_scope_languoids.R		SA_scope_languoids.R
duplicate_reduction.md		duplicate_reduction.md
filename_extraction_glottolog_v3.R		filename_extraction_glottolog_v3.R
workflow.md		workflow.md

Folders and files

Latest commit

History

Repository files navigation

"grammar mining"

Workflow Documentation

Defining the scope

Link extraction

Download script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages