27 June 2008

Name-generating algorithm

I've been dabbling with algorithms to generate new, fictional names for new, fictional cultures for ages. Sometimes I just goof around, sometimes I do real research and sometimes I devise implementations. I've got some code that I've been playing with today and I decided to write about what I'm doing. As you will see, the results are not yet entirely satisfying.

Check out the short wiki article on syllable theory for some basics. There's lots more to read on the web if you want to.

The first step was to build up a potential language. I selected an incomplete, base set of phonemes with which I would work:

Co: b, f, j, k, l, p, r, st, y
Cr: b, d, ph, g, h, dge, ck, ll, m, n, p, r, s, t, v, gh, z
v: a, e, ee, o, oo, u, au, ea, eu, oa, ou, ua, ue

I chose these five years ago when I was first developing a spreadsheet that would generate all the possible syllables of certain configurations. I know that limiting the raw numbers was one concern, but I frankly don't recollect why I kept which ones. It doesn't really matter and I could swap them out as desired.

(I think that Co indicates consonants used in syllabic onset and Cr indicates consonants used in the rime. Note some, but not total, overlap.)

I was writing this up today as an element in a little game that I've been messing with so that pawns would have names and personalities. Because of this context I wanted each game to have a cohesive linguistic feel. I went with those default phonemes above and then for each game-run, limited the available elements by removing roughly 50% of them at random. (Each phoneme had a 50% chance to stay or go so really weird things could occasionally happen.)

In one example (which I'll stick to through this article), the retained phonemes look like this:
Co: j, k, p, st, y
Cr: b, ph, g, h, dge, ck, ll, m, p, r, t, v
v: oo, u, ea, oa, ou

The next step is to build syllables that the language uses. I started out accumulating EVERY syllable possible of three different schemes. But that's too much because it lead to a bunch of over-similar syllables and the language ends up feeling funny. I already had a function that trimmed an array in half as described above, so I just ran them through that.

So the three kinds of syllables that I built are: VC Rime, CVC and CV. using the above-described phonemes and trimming algorithm, I created the following syllables that would be available for use in names (or other words, for more ambitious projects):

R/Rime: oob, oodge, oock, ooll, ub, uph, uh, uck, ull, up, eall, eap, ear, oab, oaph, oag, oadge, oack, oat, oav, oum, our, ouv

CoR/CVC: joodge, jooll, jup, jeall, joaph, joag, joadge, joack, joat, koob, koodge, koock, kooll, kub, kuh, kull, kup, keall, koab, koaph, koag, koack, koat, koum, kouv, poob, poodge, pub, puck, pup, pear, poab, poag, poadge, poack, poat, poav, poum, stoob, stub, stuh, stuck, stull, stup, steap, stear, stoaph, stoag, stoadge, stoack, stoat, stour, yoob, yoodge, yoock, yooll, yub, yuph, yuh, yull, yup, yeall, yeap, yoack, yoat, yoav, youm, your

CoV/CV: jea, joa, jou, ku, koa, poo, pu, poa, pou, stoo, stea, yea, yoa

Then in order to put these together to form words, I identified (some taken from literature and some made up by me) eight configurations that were reasonably likely to produce plausible word-forms. They are:

CoR 'CVC
CoV_CoR 'CV_CVC
CoV_CoV 'CV_CV
CoR_CoR 'CVC_CVC
CoR_CoV 'CVC_CV
R_CoV 'R_CV
R_CoR 'R_CVC
CoR_R 'CVC_R

In my application, I just randomly select one as the pattern to which male names will adhere and one to which female names will. using the above syllables, my app generated the following sixty names:


female names using(CoR_CoV):
koabkoa
koagstea
koatpou
stoatyoa
stuhpu
stupjoa
stulljou
koodgeyoa
poadgestoo
poabpoa
stearku
stubyoa
stoaphpu
poobkoa
kealljoa
koollpu
joatpu
poavstea
stoagjou
stullpu
joollstea
stuckkoa
yullstea
stoatpoo
koodgepoo
stearpoo
poodgejoa
stourjea
yoodgekoa
joadgestea


male names using(CoR_R):
keallup
stoageap
koackooll
joodgeoock
stulluck
kealluh
joatouv
poatoav
stearuh
joadgeoum
yoodgeouv
koockoav
puboack
joadgeoob
yoobear
jupull
joatoat
kulloaph
joackoock
youroadge
yooboag
yoodgeoav
koockup
koackup
stooboum
koumeap
kupub
joolluh
kulloav
joadgeoab


Sadly, these are really, hideous names. They do seem to share a set of sounds and rules, but a horrible one. So, clearly, I have some work to do to avoid Joadgeoum and Koodgepoo, but I'm getting there.

If anyone wants the VB.NET class that I'm using to generate this stuff, here it is:


Imports System.Text

Public Class Language
Dim rand As New Random

Public Co() As String = {"b", "f", "j", "k", "l", "p", "r", "st", "y"}
Public Cr() As String = {"b", "d", "ph", "g", "h", "dge", "ck", "ll", "m", "n", "p", "r", "s", "t", "v", "gh", "z"}
Public v() As String = {"a", "e", "ee", "o", "oo", "u", "au", "ea", "eu", "oa", "ou", "ua", "ue"}

Public R As New List(Of String) 'VC (VCr) Rimes
Public CoR As New List(Of String) 'CVC (CoVCr) Syllables
Public CoV As New List(Of String) 'CV (CoV) Syllables

Public malePattern As wordPattern = rand.Next(0, 8)
Public femalePattern As wordPattern = rand.Next(0, 8)

Public Sub New()
Co = randomHalf(Co)
Cr = randomHalf(Cr)
v = randomHalf(v)
popR()
popCoR()
popCoV()
End Sub

Public Function getName(ByVal sex As gender) As String
Dim pattern As wordPattern
If sex Then pattern = malePattern Else pattern = femalePattern
Dim sb As New StringBuilder
For Each seg As String In pattern.ToString.Split("_")
Select Case seg
Case "Co" : sb.Append(Co(rand.Next(0, Co.Length)))
Case "Cr" : sb.Append(Cr(rand.Next(0, Cr.Length)))
Case "v" : sb.Append(v(rand.Next(0, v.Length)))
Case "R" : sb.Append(R(rand.Next(0, R.Count)))
Case "CoR" : sb.Append(CoR(rand.Next(0, CoR.Count)))
Case "CoV" : sb.Append(CoV(rand.Next(0, CoV.Count)))
End Select
Next
Return sb.ToString
End Function

Private Sub popR()
For Each seg1 As String In v
For Each seg2 As String In Cr
R.Add(seg1 + seg2)
Next
Next
R = randomHalf(R)
End Sub

Private Sub popCoR()
For Each seg1 As String In Co
For Each seg2 As String In R
CoR.Add(seg1 + seg2)
Next
Next
CoR = randomHalf(CoR)
End Sub

Private Sub popCoV()
For Each seg1 As String In Co
For Each seg2 As String In v
CoV.Add(seg1 + seg2)
Next
Next
CoV = randomHalf(CoV)
End Sub

Private Function randomHalf(ByVal a() As String) As String()
Dim output As New StringBuilder
For i As Int16 = 0 To a.Length - 1
If rand.Next(0, 2) Then
If output.Length > 0 Then output.Append(",")
output.Append(a(i))
End If
Next
Return output.ToString.Split(",")
End Function

Private Function randomHalf(ByVal a As List(Of String)) As List(Of String)
Dim output As New List(Of String)
For i As Int16 = 0 To a.Count - 1
If rand.Next(0, 2) Then
output.Add(a(i))
End If
Next
Return output
End Function

End Class

Public Enum wordPattern
CoR 'CVC
CoV_CoR 'CV_CVC
CoV_CoV 'CV_CV
CoR_CoR 'CVC_CVC
CoR_CoV 'CVC_CV
R_CoV 'R_CV
R_CoR 'R_CVC
CoR_R 'CVC_R
End Enum

No comments: