A good article this time, which entails a slightly different approach to editing than that taken for stub and start class articles. Less than 0.5% of articles are rated as Good or above on Wikipedia. As there is much more content than a random page usually throws up I may or may not edit the whole article. If I choose not to, I will often take the lead or last section and edit that before moving on (or sometimes a section at random if it looks interesting). Also, rather than making all the edits in one or two sessions, I will try to break down the edits so that reasoning can be given for each group and also so that another editor can revert something that they don't like (or is wrong) without having to trash all of the edits. Often if you are going to get feedback from another editor articles like this will be where it happens, so it is reasonable to be on your best (editing) behaviour.
With all that in mind let's get to Jack Warhop's article. Warhop was an early 20th century baseball player. The page stats show about 60 editors but only 125 edits and a page that gets about 5 views per day. There was, at least, something on the talk page (as these are so often blank) but it was just an article peer review (the process of getting other editors to mark your homework) from 2013 when the last major set of edits was applied to the article creating the bulk of the page and getting the article its Good Article status.
I decided to start at the top with the opening paragraph, intending to probably only tackle that section and then move on, as the article's topic isn't that interesting to me (although the content itself was quite engaging). An undefined abbreviation, a wiki-link and then I got to the last sentence of the opening paragraph. It was a few lines on whether Jack was 5'8" or 5'9". Well, does this really matter and if it does is it important enough to be in the lead paragraph? I thought no on both counts. Also, it looks a bit like original research, which is frowned upon. A deletion is probably in order.
I really dislike doing this. But from personal experience I know that some editors have no issues with it at all. If the text I am removing is large, interesting or debateable I will often copy it to the article's talk page with a query about whether it should be included or not. However, in this case I can't really see any merit in it (and the editor who supplied it has since been permanently banned from editing…). So I will radically shorten it and preserve the references so a really curious reader can explore the issue of Jack Warhop's height in more depth.
Unfortunately this deletion then pricked my conscience and I felt that I should go through the whole article as penance. Overall it was a fairly straightforward edit, a series of references to be expanded and polished and a string of baseball terminology wiki-links to be added. Two references were just page references to a third and so could be condensed to a single reference with {{r|refname|pagenumber}} templates. Also, I found a few dead links and had to tag all the NY Times citations with |url-access=subscription red padlocks (
) indicating their lack of open access.
![Lock-red](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/Lock-red.svg/32px-Lock-red.svg.png)
For the last dead link it was straightforward to find the article on another part of the referenced website, Genealogy Trails, and for the first one the Wayback Machine came up trumps as the page on Newspapers.com had been archived.
But for the middle one, which used a Google News link, although archived pages were present the actual page image hadn't been captured. The newspapers archive of Google News wasn't a service I was aware of, but it appears to be an old service given its formatting. I'm not sure how many titles are archived; many hundreds at least. The dead link's reference was to The Milwaukee Sentinel, a title since subsumed by merger into The Milwaukee Journal Sentinel. Google did have a copy of the paper's archives but has since been told to take it down. As usual it is grubby commercial interests (NewsBank - I won't provide a link) that are responsible and are currently trying to extort the public library system in Milwaukee for access to archives that they provided the material for in the first place. Bottom line, I can't find access to this one.
That left the handful of NY Times articles that may have been accessible in the past but are now hidden behind a paywall (though not a very high one - more later) even though they are long out of copyright. For most of the stories the Internet Archive has the issues of the paper imaged. So I will happily strip out all the URL links to the NY Times website in favour of the Internet Archive and render them open source. That was good for five of seven of the articles.
I had to get more devious for the last two issues of the NY Times, which the Internet Archive for some reason doesn't hold. The first reference is to the 6th July 1912 issue of the paper and the NY Times webpage gives the title of the story and the rest is obscured by a demand for payment. Not a vast amount, but I don't have a US$ denominated account. So not much use to me or most of the rest of the world.
But as the link is on Wikipedia the web-spiders at the Internet Archive will have crawled it, so now we are able to view the page without the demand. Here at least we can see the précis and a link to Download PDF, but still can't see the article.
Clicking the PDF link leads to a page stating that the webpage at the URL is available but not archived. The second reference (23rd Aug 1912) follows the same pathway leading to an unarchived page.
So a dead end?
Well, actually, I tried this route before finding the Internet Archive's NY Times archive. As you might imagine, I started at the top of the list of the seven articles to work my way down. For the first reference (10th May 1911), for some reason, the Download PDF button worked and passed through to an archived copy of the story. Now we have the web address of a working PDF copy we can see that the data is ordered by year/month/date folders and the individual stories are given a nine digit ID number. With this archive structure all I need now are the two articles' 9 digit ID numbers and I might be able to use the same layout to stroll around their paywall and get to the content.
Returning to 6th July 1912 story, the NY Times site doesn't give us this number, as we can't get round the banner, and neither does the page source HTML. The Internet Archive webpage is also unhelpful, but the source code of that page gives us the target of the button as article ID number 104899650.
Thus, changing
https://timesmachine.nytimes.com/timesmachine/1911/05/10/105027109.pdf?pdf_redirect=true&ip=0
to
https://timesmachine.nytimes.com/timesmachine/1912/07/06/104899650.pdf?pdf_redirect=true&ip=0
and
https://timesmachine.nytimes.com/timesmachine/1912/08/23/104905152.pdf?pdf_redirect=true&ip=0
drops us straight into the required stories. Simples!
Just in case this is not a permanent state of affairs I submitted both the links to the Internet Archive's Save Page Now service so that they are captured and that is all the red padlocks dispensed with.
Finally, as I scanned some of the references I discovered that Jack wasn't a coal-miner, but a coal-shoveller or a fireman. I did wonder why the Chesapeake & Ohio Railroad employed coalminers. So my last task was to remove the Category:American coal miners link and that was that.
Overall it took five days and ten edits, moving me up to third place in authorship contribution and adding about 10% of the text of the article.
Comments
Post a Comment