Skip to main content

Warhop top-to-tail

A good article this time, which entails a slightly different approach to editing than that taken for stub and start class articles. Less than 0.5% of articles are rated as Good or above on Wikipedia. As there is much more content than a random page usually throws up I may or may not edit the whole article. If I choose not to, I will often take the lead or last section and edit that before moving on (or sometimes a section at random if it looks interesting). Also, rather than making all the edits in one or two sessions, I will try to break down the edits so that reasoning can be given for each group and also so that another editor can revert something that they don't like (or is wrong) without having to trash all of the edits. Often if you are going to get feedback from another editor articles like this will be where it happens, so it is reasonable to be on your best (editing) behaviour.

With all that in mind let's get to Jack Warhop's article. Warhop was an early 20th century baseball player. The page stats show about 60 editors but only 125 edits and a page that gets about 5 views per day. There was, at least, something on the talk page (as these are so often blank) but it was just an article peer review (the process of getting other editors to mark your homework) from 2013 when the last major set of edits was applied to the article creating the bulk of the page and getting the article its Good Article status.

I decided to start at the top with the opening paragraph, intending to probably only tackle that section and then move on, as the article's topic isn't that interesting to me (although the content itself was quite engaging). An undefined abbreviation, a wiki-link and then I got to the last sentence of the opening paragraph. It was a few lines on whether Jack was 5'8" or 5'9". Well, does this really matter and if it does is it important enough to be in the lead paragraph? I thought no on both counts. Also, it looks a bit like original research, which is frowned upon. A deletion is probably in order.

I really dislike doing this. But from personal experience I know that some editors have no issues with it at all. If the text I am removing is large, interesting or debateable I will often copy it to the article's talk page with a query about whether it should be included or not. However, in this case I can't really see any merit in it (and the editor who supplied it has since been permanently banned from editing…). So I will radically shorten it and preserve the references so a really curious reader can explore the issue of Jack Warhop's height in more depth.

Unfortunately this deletion then pricked my conscience and I felt that I should go through the whole article as penance. Overall it was a fairly straightforward edit, a series of references to be expanded and polished and a string of baseball terminology wiki-links to be added. Two references were just page references to a third and so could be condensed to a single reference with {{r|refname|pagenumber}} templates. Also, I found a few dead links and had to tag all the NY Times citations with |url-access=subscription red padlocks (Lock-red) indicating their lack of open access.

For the last dead link it was straightforward to find the article on another part of the referenced website, Genealogy Trails, and for the first one the Wayback Machine came up trumps as the page on Newspapers.com had been archived.

But for the middle one, which used a Google News link, although archived pages were present the actual page image hadn't been captured. The newspapers archive of Google News wasn't a service I was aware of, but it appears to be an old service given its formatting. I'm not sure how many titles are archived; many hundreds at least. The dead link's reference was to The Milwaukee Sentinel, a title since subsumed by merger into The Milwaukee Journal Sentinel. Google did have a copy of the paper's archives but has since been told to take it down. As usual it is grubby commercial interests (NewsBank - I won't provide a link) that are responsible and are currently trying to extort the public library system in Milwaukee for access to archives that they provided the material for in the first place. Bottom line, I can't find access to this one.

That left the handful of NY Times articles that may have been accessible in the past but are now hidden behind a paywall (though not a very high one - more later) even though they are long out of copyright. For most of the stories the Internet Archive has the issues of the paper imaged. So I will happily strip out all the URL links to the NY Times website in favour of the Internet Archive and render them open source. That was good for five of seven of the articles.

I had to get more devious for the last two issues of the NY Times, which the Internet Archive for some reason doesn't hold. The first reference is to the 6th July 1912 issue of the paper and the NY Times webpage gives the title of the story and the rest is obscured by a demand for payment. Not a vast amount, but I don't have a US$ denominated account. So not much use to me or most of the rest of the world.

New York Times paywall

But as the link is on Wikipedia the web-spiders at the Internet Archive will have crawled it, so now we are able to view the page without the demand. Here at least we can see the prĂ©cis and a link to Download PDF, but still can't see the article.

Page as viewed from the Internet Archive

Clicking the PDF link leads to a page stating that the webpage at the URL is available but not archived. The second reference (23rd Aug 1912) follows the same pathway leading to an unarchived page.

So a dead end?

Well, actually, I tried this route before finding the Internet Archive's NY Times archive. As you might imagine, I started at the top of the list of the seven articles to work my way down. For the first reference (10th May 1911), for some reason, the Download PDF button worked and passed through to an archived copy of the story. Now we have the web address of a working PDF copy we can see that the data is ordered by year/month/date folders and the individual stories are given a nine digit ID number. With this archive structure all I need now are the two articles' 9 digit ID numbers and I might be able to use the same layout to stroll around their paywall and get to the content.

Returning to 6th July 1912 story, the NY Times site doesn't give us this number, as we can't get round the banner, and neither does the page source HTML. The Internet Archive webpage is also unhelpful, but the source code of that page gives us the target of the button as article ID number 104899650.

Thus, changing

      https://timesmachine.nytimes.com/timesmachine/1911/05/10/105027109.pdf?pdf_redirect=true&ip=0

to

      https://timesmachine.nytimes.com/timesmachine/1912/07/06/104899650.pdf?pdf_redirect=true&ip=0

and

      https://timesmachine.nytimes.com/timesmachine/1912/08/23/104905152.pdf?pdf_redirect=true&ip=0

drops us straight into the required stories. Simples!

Just in case this is not a permanent state of affairs I submitted both the links to the Internet Archive's Save Page Now service so that they are captured and that is all the red padlocks dispensed with.

Finally, as I scanned some of the references I discovered that Jack wasn't a coal-miner, but a coal-shoveller or a fireman. I did wonder why the Chesapeake & Ohio Railroad employed coalminers. So my last task was to remove the Category:American coal miners link and that was that.

Overall it took five days and ten edits, moving me up to third place in authorship contribution and adding about 10% of the text of the article.

Comments

Popular posts from this blog

Evri-thing, Evri-where, All Over the Place

You have to wonder why Hermes went through the hassle of rebranding themselves to Evri, then you get a parcel delivered by them. Only a year ago in a desperate, and cynical, attempt to shed their woeful reputation as the worst parcel delivery company in the UK Hermes rebranded themselves as Evri. On the 14 th March 2022 they rolled out their corporate PR machine and made promises . "The new brand will also see a significant investment in its customer service as part of its commitment to ensuring that its customer service remains responsive, knowledgeable and helpful. This will include Evri opening a fully UK-based customer service team and adding 200 experts who will be based in local depots, closer to where potential issues are. It will also be upgrading its chatbot and releasing more phone lines for those who prefer to speak directly to someone." So was it a surprise when I got an e-mail about a parcel I was expecting? Well, most other couriers don't do this an...

How-to: Setting up a Blogger account with a non-Google e-mail address

As I've set up an e-mail address for the blog it makes sense to be able to post from it too. To do that I will need to set up an account with Blogger for the new e-mail address. At the same time I'll create a gmail account to both manage the blog and give access to the Google apps suite without the monthly fee. Skullcinema@gmail.com is already taken (as there is nothing new under the sun), but I can take a related e-mail address. I won't publish it here as the first address will generate enough spam as it is. Setting up a gmail account really doesn't require a guide, but it is covered here if you need one. So, off to set up an account  through Blogger . First of all though, log out of Google and/or Chrome or it will pull you straight through to Blogger on the account you are logged in on. Choose SIGN IN in the top right-hand corner of the Blogger home page, and then select  Use another account underneath the list of your current Google accounts. Now ...

Alexander Aircraft Company

And here we are at the Alexander Aircraft Company , a start-class rated article about a defunct aeronautical engineering company from the USA in the 1920's. As this is a subject with more interest to me I spent a little longer than usual on this article, giving it a top-to-tail polish. Starting with the lead paragraph , which consisted of a grand total of twelve words on arrival, I expanded it, if only by a bit, so at least there are now two sentences. The associated book citation was limited so this was expanded as well. Founding and Disaster subsections received a heavy dose of wiki-linking and the single reference provided across both subsections was dead, requiring a trip to The Wayback Machine to search for the given URL. The Wayback Machine is part of The Internet Archive , which is a not-for-profit set up back in the early days of the web to archive the nascent internet, as it was recognised that the content was not anywhere near as permanent as the previous forms of...