Recently it was necessary for me to be involved in testing the consistency of search indexing as part of a proof of concept for a client.

This testing happened to coincide with a new demo I was building for community/conference gigs where I needed to populate a SharePoint site with a shed ton of data (100,000k plus documents) so figured I needed a way to generate documents on an industrial scale.

The challenge was to some up with a way of creating documents that were not just full of Lorem Ipsum or a similarly constrained vocabulary as I wanted to be able to exercise the client search index and didn’t just want to load it with what would effectively constitute a load of documents full of keywords.

Needless to say, Powershell would be the solution to my woes…

Rather than me spending a bunch of time explaining the make up of the script, check it out and if you have any questions, drop a comment.

 Function New-RandomlyCreatedWordDocuments { Param ( [Parameter(Mandatory=$True)] [string]$seedfile, [Parameter(Mandatory=$True)] [string]$seednames, [Parameter(Mandatory=$True)] [int]$documentstocreate, [Parameter(Mandatory=$True)] [string]$outputpath ) # let's spin the variables $textarray = Get-Content $seedfile -Delimiter "`t" $count = New-Object system.Random $loops = 0 $rand = new-object System.Random $words = import-csv $seednames # i don't like wiring values into variables, but this way is simpler in the context of this script $conjunction = "to budget for","to keep","and","with","without","in","for","to remove the" # create the Word COM object # Microsoft Word must be installed on the machine this script is run upon # original inspiration came from <a href=""></a># #kudos $word=new-object -ComObject "Word.Application" do { # spin up a new document $doc = $word.documents.Add() $selection = $word.Selection # insert some random text $paraloop = 1 $paragraphs = $count.Next(50) do { $randomiser = $count.Next($textarray.Count) $selection.TypeText($textarray.get_Item($randomiser)) $paraloop++ } while ($paraloop -lt $paragraphs) $selection.TypeParagraph() # save the document with a great filename # this is from hanselman # <a href=""></a> $word1 = ($words[$count.Next(0,$words.Count)]).Label $con = ($conjunction[$rand.Next($conjunction.Count)]) $word2 = ($words[$count.Next(0,$words.Count)]).Label # end of the hanselman bit $documentname = $word1+" "+$con+" "+$word2+".docx" $doc.SaveAs([ref]($outputpath+$documentname)) $doc.Close() Write-Host "Generated document number"($loops+1) $loops++ } until ($loops -eq $documentstocreate) #exit word $word.quit() } # now we can invoke the function and pass our parameters in Clear-Host New-RandomlyCreatedWordDocuments -seedfile "C:\bigseb\demodox\source\para.txt" -seednames "C:\bigseb\demodox\source\subjects.csv" -documentstocreate 50 -outputpath "C:\bigseb\demodox\" 

I’ve included the two seedfiles I used in the GitHub repo here to make your life easier. Note that the first is a tab separated file based on a SQL Server whitepaper (and it’s copyright is owned by Microsoft) and the second is a CSV file of the Integrated Public Sector Vocabulary (IPSV) as used by the Public Sector in the UK – it contains some terms that may not be suitable for all of your needs so worth a quick review before you use it 🙂

more to follow…