If you’re anything like me you’ve been taking digital pictures for a long time now. You’ve used various strategies and tools for organizing them over the years. You’ve gone through several computers and moved the files around countless times. Where does that leave you? With an unorganized mess.
With library software like Picasa you can import all of those pictures and you can have some semblance of organization by way of the user interface but it doesn’t really solve the problem at the core. The files are a mess.
I wanted to find a way to give a consistent filename to all of my pictures, organize them into folders based on the month and year they were taken, and remove duplicates. At first it sounded like a tall order as I couldn’t find any off the shelf tools to do this. Thankfully with just a little bit of time in Powershell I was able to put together a script that accomplished this for me.
The script does the following:
- Identify all of the existing pictures
- Query the EXIF data
- Calculate an MD5 has of the file
- Create a new copy based on the data in a “staging” folder
- Parse the new file name to get the MD5 and look for duplicates
- Delete the duplicate version
- Move the files to the YYYY\MM folders
The first task of querying the EXIF data of files was actually the hardest. I found a couple of blog articles that touched on this. The synopsis is that you have to use the .NET System.Drawing DLL. We can use Get-ChildItem to recurse a directory structure looking for files of a specific type. For each file that we find we instantiate a new Bitmap object which contains the EXIF properties. We can extract and update these. I thought it might be useful to store the original path as a “Comment” in the EXIF in case it contained some relevant information that I later want to turn into a tag.
Below is the complete script.
#Load the .net System.Drawing assembly for examining the EXIF data of the pictures. [reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll") #Function to calculate the MD5 hash of a file function Get-MD5([System.IO.FileInfo] $file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]')) { # This Get-MD5 function sourced from: # http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx $stream = $null; $cryptoServiceProvider = [System.Security.Cryptography.MD5CryptoServiceProvider]; $hashAlgorithm = new-object $cryptoServiceProvider $stream = $file.OpenRead(); $hashByteArray = $hashAlgorithm.ComputeHash($stream); $stream.Close(); ## We have to be sure that we close the file stream if any exceptions are thrown. trap { if ($stream -ne $null) { $stream.Close(); } break; } $md5 = "" foreach ($byte in $hashByteArray) { $md5 = $md5 + $byte.ToString("X2"); } return $md5; } #Figure out where we are at and if there is a subfolder called output. If not we will create one. This is where we will put all of our images. $currdir = split-path -parent $MyInvocation.MyCommand.Definition $outputdir = $currdir + "\output\" if (!(Test-Path -path $outputdir)) { New-Item $outputdir -type directory } #What files should we look for? Typically this would be *.jpg. $ext = "*.jpg" #Using Get-ChildItem we search for all files matching our extension recurisvley from the location of the script down. $files = Get-ChildItem -r -Include $ext #We're going to keep track of how many files we process and put a unique number in the file for each one (eliminates all possibility of duplicate filename) $i = 0; foreach($f in $files) { #Increment our counter $i++; #Load up the .net system.drawing.bitmap object for the current file. We will use this to access and update the exif data. $img=New-Object -TypeName system.drawing.bitmap -ArgumentList $f.fullname; #We grab up the camera date, height, and width. We use try catch in case the property isn't availabe and set a default value. #For more details on properties available check: #http://blogs.technet.com/b/jamesone/archive/2007/07/13/exploring-photographic-exif-data-using-powershell-of-course.aspx Try { #The value is a byte array which we need to convert (assuming ASCII character set) $date = [System.Text.Encoding]::ASCII.GetString($img.GetPropertyItem(36867).Value); } Catch [system.exception] { #Default value in case we can't access the EXIF $date = "0000:00:00 00:00:00"; } #Grab the height and width of our object $height = $img.Height; $width = $img.Width; #The date is returned as a null terminated string with spaces and colons in it. We replace all of these with dashes to make it filename friendly. $date = (($date.Replace("`0", "")).Replace(" ","-")).Replace(":","-"); #Calcualte the MD5 of the original file so that we can look for duplicates later $md5 = Get-MD5($f); #The target filename will be the output directory with the variables concatnated in the below format. #Format will be YYYY-MM-DD-HH-MM-SS-WIDTHxHEIGHT-MD5-ID.Extension $filename = $outputdir + [string]::Format("{0}-{1}x{2}-{3}-{4}{5}", $date, $width, $height, $md5, $i, $f.extension); #We want to save the current path as a comment so we create a new property of type 40092 (comment) $property = $img.PropertyItems[0]; $property.Id = 40092; $property.Type = 1; #It needs a string array so we pass in the current path ($f.fullname) and convert it to an array $property.Value = [system.text.encoding]::Unicode.GetBytes($f.fullname + ":" + $md5); $property.Len = $property.Value.Count; $img.SetPropertyItem($property); #We will save our image from memory in the path of our new file. $img.Save($filename); $img.Dispose(); #Could delete the old version, I'm leaving it as a backup so I've commented this out. #Remove-Item $f.fullname; #Let the user know what the current status is and which files are being moved. Write-Output "Copying $f to $filename"; } #This is a one liner to split on filename, find the duplicates by MD5, ignore the first result and then delete the rest #The 7th item in the filename format is the MD5 hence the hard coded index $o = Get-ChildItem $outputdir ` | Select-Object @{Name="MD5";Expression={($_.Name).Split("-")[7]}}, @{Name="Filename";Expression={$_.Fullname}} ` | Group-Object md5 ` | ?{ $_.Count -gt 1 } ` | % {($null, $rest) = $_.Group; $rest;} ` | Select-Object Filename ` | % { Write-Output "Removing duplicate $_"; Remove-Item $_.Filename } Write-Output $o #Now that we've zapped our duplicates we can move the files out to their target locations #We loop through them again extracting the important bits from the filename #Format will be YYYY-MM-DD-HH-MM-SS-WIDTHxHEIGHT-ID.Extension $files = Get-ChildItem $outputdir $i = 0; foreach($f in $files) { $i++; #split string on dash and create an array of attributes $name = ($f.name).split("-"); $year = $name[0]; $month = $name[1]; $day = $name[2]; $hour = $name[3]; $min = $name[4]; $sec = $name[5]; $size = $name[6]; $md5 = $name[7]; #Where are we putting the new file? $targetname = $outputdir + $year + "\" + $month + "\" + [string]::Format("{0}-{1}-{2}-{3}-{4}-{5}-{6}-{7}{8}", $year, $month, $day, $hour, $min, $sec, $size, $i, $f.extension); #Make sure that our folders exist (one for each month under the year) and if not create them if (!(Test-Path -path ($outputdir + $year))) { New-Item ($outputdir + $year) -type directory } if (!(Test-Path -path ($outputdir + $year + "\" + $month))) { New-Item ($outputdir + $year + "\" + $month) -type directory } #Move the source file to it's new home. -Force tells it to overwrite if the file already exists. Write-Output "Moving $f to $targetname"; Move-Item -Path $f.fullname -Destination $targetname -Force }
Here are a couple of references that I found helpful in building this script:
http://blog.codeassassin.com/2007/10/13/find-duplicate-files-with-powershell/






No comments yet.
Leave a comment