April 11, 2008
We now have over 100,000 words in the database. Data entry and source acquisition has continued.
Things will go into a bit of a hiatus while I move the grant from Rice to Yale.
April 11, 2008
We now have over 100,000 words in the database. Data entry and source acquisition has continued.
Things will go into a bit of a hiatus while I move the grant from Rice to Yale.
March 10, 2008
I see it’s been a while since I posted an update.
December 1, 2007
(The blog has had 463 hits and the website 163.)
November 27, 2007
Here is the first set of information regarding the database structure I am using for the comparative database. Links will follow in a separate post.
Tables:
* Sources
* Languages
* Data [Data I've imported, only Karnic at this stage]
* Reconstructions
* Personnel (not used in the end)
* Labor
* Notes (not really used much)
* BasicData [Basic vocab lists which RAs are typing]
* CurrData [Curr wordlists]
* CurrSources [Source information from Curr]
* BodyData [Body part data]
* CurrEnglishList [list of English glosses in the Curr lists to
facilitate sorting]
TABLE_Languages
* LangRecordNumber [autogenerated]
* Variety [equivalent to doculect; the name used by the source]
* Language [standardised name]
* Notes (data quality and orthography notes. Not filled in at present)
* Subgroup
* Group
* Family [None of these are used at present; to be filled in as
evidence emerges]
* Attested [to allow disambiguation of proto-languages from
attested languages, since we'll be entering reconstructions from other
sources]
* AIATSIS_Code
* Source [Links to source database, with bibliographic and other
details]
TABLE_variety
* Variety [linked to Language Table, must exist in TABLE_Languages]
* RecordNumber
* OriginalForm [word as given in source]
* PhonemicisedForm [standardised; generated periodically from
script for recent sources, then checked]
* PartofSpeech
* ParadigmNote
* Gloss [gloss as given in source]
* SemanticField [standardised; I have about 15 at the moment]
* GeneralisedGloss ['cover' gloss, e.g. neutralising 'belly' vs
'stomach', 'angry' vs 'cheeky'. To be filled in by a script at some point]
* CognateCode [links to Reconstructions database]
* LoanCode [links to Reconstructions database]
* LoanSource
* EtymologicalNote
* Source [Links to source database]
* +housekeeping fields involving record creation and modification
The Curr, Body Part and Basic data tables are all structured identically.
TABLE_Reconstructions
* RecRecordNumber [Unique Identifier]
* ReconstructionLevel [Proto-language. Note that this isn't linked
to anything at present]
* Form
* Gloss
* PartofSpeech
* SemanticField
* Status
* LoanCode [housekeeping field; probably unnecessary]
* Note
* other housekeeping fields
November 19, 2007
If you’ve downloaded my paper on modelling Karnic using NeighborNet, outload it. There’s a fairly major error in the data coding which skewed everything. It’s now fixed and I’ll repost the paper with the corrected data soon.
November 6, 2007
Please find below a list of the Curr (1886) vocabulary lists which have been processed as part of this grant. Most of the lists were already typed up and appear in ASEDA. We have proofread the lists.
We will be adding more information about the modern names of the varieties described here over the next few months.
November 1, 2007
Claire was on fieldwork in October:
Back in Houston:
To do:
November 1, 2007
Dictionary:
Grammar:
Learner’s guide:
Yolngu Dialectology:
September 21, 2007
This update’s a bit early, because I’ll be in Adelaide for ALS next week.
To do:
September 3, 2007
In short, it would be useful to have a searchable morpheme list for Pama-Nyungan languages. Therefore, I am compiling one as part of NSF CAREER grant “Pama-Nyungan and the Prehistory of Australia”. Such a database is a major undertaking, though. Therefore, I’m putting out a general request for data contributions.
I’m well aware of the problems of reconstructing morphology in isolation, so this will not be a reconstructions database per se. On the other hand, when reconstructing, say, Karnic, it’s very useful to know whether other languages (in the region or elsewhere) have an ablative morpheme -mu. Currently there’s no easy way to find out this sort of thing.
I will be distributing an Excel file (and other formats, if requested). The file will be an export from my main Filemaker database, which further includes reconstructions, source lists, language/variety/doculect lists, and other information. A sample of what the morphology database will look like is available here.
RecordNumber
DateEntered
Contributor
Variety
Source
OriginalForm
StandardisedForm
OriginalGloss
StandardisedGloss
Environment
PartOfSpeech
OtherNote
The database will be on a password-protected site. Anyone who has contributed data will be given the password. If you need access to the database and aren’t in a position to contribute data, please send me an email outlining why you want access to the database.
A database like this will be updated frequently. Therefore, things will get very confusing if there is more than one source for the database file. Furthermore, I need to track usage and download statistics as part of the grant conditions. I have decided to make this downloadable (rather than queriable online) because I assume that will be of more use to users. You may not pass on the database (or the password) to any third party. Please refer interested parties to me instead.
Updates will be available regularly throughout the project — at this stage, I anticipate that updates will be released approximately 4 times per year, although more frequent updates may also be considered, depending on how much data we are able to include.
Any and all Pama-Nyungan languages. A list of languages for which data has been contributed (along with any notes about completeness and the source of the data) will appear on the download page. The utility of the database depends to a great extent on how many languages we can include.
Published or unpublished can be contributed. However, if the data are unpublished and you are not the collector, I need some sort of statement that the collector gives permission to pass on the data. We do not have the resources to check this sort of thing. If the collector is not in a position to give permission (e.g. because they’re no longer with us), we’ll need some other indication that we are not going against their wishes (or violating anyone’s copyright) by including the data.
No, although the bulk of the project will run for the next three years, and the sooner we get the data, the sooner it can be included.
We will accept data in just about any electronic format. That includes examples and tables cut and pasted from Word documents, text files, lexical databases, and so on. However, you will make our lives vastly easier if you send us structured data (e.g. Shoebox backslash coded data, excel spreadsheets, Filemaker data, xml data, etc.).
Here is a template for data entry, in Excel. A sample with Yolŋu data can be found here.
Please avoid abbreviations (e.g. we would like to avoid situations where it’s impossible to tell if IMP means ‘imperative’ or ‘imperfect’). If you do use abbreviations, please also include a key.
No, however the more comprehensive the data, the better the database will be. You can also send us data in installments, however please don’t resend earlier data that’s already been included. We will do our best to avoid duplication.
If you find an error, please email us a copy of the relevant entries, along with a description of the nature of the error (typo, wrong language, wrong source, etc) and we’ll fix it.
We will add your data to the database, standardise the orthography (while retaining the orthography of the original in a separate field), and convert the structure and glosses to a format which allows for standard searching across the database.
We will be working on this project as time permits (in addition to the PI, there are about 5 research assistants).
Please email files to proto.pama.nyungan@gmail.com.
I’d be grateful if you acknowledged the database in your work. Please quote the following:
Bowern, Claire (compiler) 2007. Pama-Nyungan morphological database. Version X. URL: http://www.owlnet.rice.edu/~ppny/morphologydatabase.htm
The version information will appear in the file name of the database and on the main database page.
A lexical database is also in progress. It will contain body parts and basic vocab and is part of a long-term reconstruction project. While the morphology database is being compiled as a side-project, the lexicon project is a much bigger undertaking and we hope the full database will be published at the end of the project, along with reconstructions. Further information will be available shortly. We will be humbugging specialists in different areas, but we’d also be happy to accept digital data donations. Please contact me at the above email address for more information.