PHP has always been a handy scripting language to know. I’ve completed many personal projects ranging from Wordpress plugins to organizing my music. However, my latest creation allowed me to create a database of all movie trilogies and some information about them. Using this database I was able to create pre-generated pages of every movie trilogy on earth and the next step was creating a website that was focused around listing all of the movies in their proper order. Below are the steps I used to make this happen.
The first part of my journey was landing on a full list of every movie trilogy. I found such a list on Wikipedia, although it was a little rough around the edges. I had to edit many of the lines to create a workable list that could be entered into a database. After the movies were entered into a database, it was a matter of finding a working API from IMDB to get more information about them. While IMDB does not technically have a free API, I was able to find a 3rd party one for free.
The 3rd party API was JSON, so parsing it with PHP was a breeze. I made a script that looked up every row in my SQL database, and took the movie name and entered it into the JSON API. The script then filled in missing information from each row in the database such as actors, rating, description, producers, and more. The last piece of information was the link to each movie’s DVD release image. For now, I simply kept the IMDB image URL in my database and didn’t download anything – that would be the next step.
The next script would be to look up every movie in my database, and get the image URL. From the URL I would download the image, and save it locally. However, there was a minor hitch in this plan as IMDB limited the amount of data I could download. To remedy this, I added a small delay between each download of 100ms. Because of this small delay, and the size of the database, it took a few hours to complete. I should also note I had to edit my php.ini to allow a script to run for an unlimited amount of time.
Lastly, I had to somehow add this information to my website. I choose Wordpress as a CMS for my website, so adding new posts was a matter of looking up a few Wordpress functions. Once I learned the Wordpress functions, I coded a script that allowed me to create HTML pages out of the information in the database. That information was then entered into a new database in the form of a Wordpress post.
I decided I did not want to flood my new website with thousands of pages of content right away. So I created a cron job that allowed four new posts per day to go online. This would better able search engines to slowly index my content, while giving the illusion of the website being very active. One thing I did was add a noindex tag on each page, but only for Googlebot. I predict that Google would flag my website as spam under the Panda algorithm, so I just noindexed it off the bat. I also predict that both Bing and Yahoo will eat this content up and deliver a lot of traffic. You may be wondering why I would noindex my whole site with Google. Well I didn’t! I created a ton of dofollow content with 500 word articles, like the Harry Potter movie order article. Only the auto generated pages are noindex.
So far my experiment is young with only 60 posts so far. However, as the new posts come in, I predict there will be a significant amount of traffic from both Yahoo and Bing. Only time will tell, however one thing is clear: I’ve sharpened my PHP skills for future projects.