Proofreading Scraped Blogs
Proofreading Scraped Blog recipes
Recipes from the food blogs EYB indexes, unlike those from books and magazine issues, are not indexed completely from scratch; instead, EYB "scrapes" the recipes from the blog and maps some of the data into EYB's recipe format. Generally, the data scrape should download the recipe name, the Online URL, and the publication date (date posted). The converted recipes then need to be proofread to complete the EYB categories, ingredients, and any other applicable fields, as well as to edit the scraped recipe titles to conform to EYB indexing standards.
A blog assigned to you for indexing appears on your "Books assigned to index" list like a book or magazine issue does. When you access the blog in Indexing, you will see that the Recipes Added list contains all the recipes scraped from the blog.
Every recipe created in the EYB scrape needs to be reviewed/proofread by calling up the recipe on the blog. Before you start proofreading, open the blog in an adjoining tab/window, then you can copy in the URL from EYB's Online URL field for every recipe to view it on the blog. You then need to index/edit the recipes as follows:
- Recipe Title: The data scrape just capitalizes the first word in the recipe title, so add back missing capitals in geographic place names, proper names, and words capitalized in EYB ingredient names, e.g., French, Tuscan, Parmesan, Amaretto. Titles also need to be reviewed for correct order and format of English and foreign titles and all other formatting rules as outlined in the Recipe Title section of the manual. NOTE: Some bloggers precede every recipe posted with the prefix "Recipe:"; these prefixes should be deleted so that just the recipe title remains.
- Other indexing fields: All EYB category fields (Recipe Title, Ethnicities, Courses, Occasions, and Nutrition), the Accompaniment and EYB Comments fields, and the Ingredients field, need to be completed as you normally would for book/magazine recipes.
- Online recipe-specific fields: The fields at the bottom of the data entry form relate to online recipes such as those in blogs. The Date Published and Online URL values are supplied in the scrape, but the Photo field needs to be reviewed, and the Photo Credit and Author fields need to be completed where applicable. See Photos and Photo Credits and Authors (click on "More on Authors for Blog recipes..." to expand that section) below for instructions specific to these fields.
Deleting non-recipes from blogs -- The recipe scrape often captures blog posts that are just articles and do not contain a recipe. When you go to the Online URL in the scraped recipe and find that there is no indexable recipe in the blog post, you need to delete those "recipes" from the Recipes Added list. You can either delete them as you go, or mark them for deletion with an Indexer Note, then filter on that note and delete them all at once when you have finished going through all the recipes. Refer to Editing and deleting recipes.
Adding recipes missed in the blog scrape -- Because the scraping process is not foolproof, sometimes recipes are missed during the scrape and must be added manually. If the blog has a comprehensive recipe index or monthly archive that lists recipe posts, it's a good idea to scan the blog list against the EYB recipe list during final proofreading. Any missing recipes need to be added as described below in Inserting a missed/new recipe. Also note the following when inserting blog recipes:
- The Sort Order value for blog recipes is based on the Date Published value, with the oldest posted recipes at the beginning of the Recipes Added list and the most recently posted recipes at the end; this order ensures that the newest blog recipes will appear first on EYB, where the default display is reverse chronological order. Therefore it is important that any missed recipes be inserted more-or-less in their proper date posted order, for example, a recipe from July 2008 should be inserted among other recipes posted during the same month/season of that year.
- Don't forget to supply the date posted in the Date Published field. Only the month, day, and year need to be entered; EYB will supply a default time stamp.
- When supplying the Online URL value, be sure to use the actual URL for the entire recipe, and not a link to an index page, archive list, or article that only lists/references the recipe but does not display it.