Note: This blog is a place where I share how I feel or what I’m doing, often to let off some steam; however, occasionally, it’s just a place where I park notes to myself in case need them later. This article is one such note to myself; however, you still might learn a thing or two about how your tax dollars are spent. If you read on, remember that you’ve been warned.
Background—including Jargon and Vocabulary
In terms of technology, one of the most backwards places to work is the State of California. Yep, while the State may be home to Big Tech, your government is stuck in the past. In a sense, I can’t blame them. Government has zero incentive to become efficient, innovate, or do better. One reason is that everybody is represented by a union whether or not they are even a member of said union. As such, no jobs can be eliminated without their express permission—which never happens. Any media reports to the contrary are lies or smokescreens to trick the public—usually for sympathy to further the union’s grip on state government.
A legacy piece of technology which is the aging backbone of California’s finances is CalSTARS (California State Accounting and Reporting System). I have written about this software before. It is a Unix based mainframe system that was brought online when Ronald Reagan was President in the 1980’s. This is the same software that desperately needed patching as part of the Y2K scare. It is still in use today.
California has spent over a billion dollars of your money to try and get their new accounting system deployed to replace CalSTARS, but as usual, it is many years behind schedule and hundreds of millions over budget. This gem is called FI$Cal (Financial Information System for California). The acronym is pronounced fis-cal [with short “I”) Think of it as pausing between syllables of the word “fiscal”.
The next thing you need to know is that within the State of California, there is no formal way for one agency to talk with another. Lazy people might blame the right to privacy in the State Constitution. This privacy clause was sold to voters as a barrier to keep agencies from sharing information and to cripple “Big Brother” from spying on citizens. Voters we told that by passing it, government would be prevented from compiling comprehensive files to track citizens. Think Soviet Union Politburo and KGB surveillance.
I have two comments on that promise. First, who needs government to do that when we have credit agencies and Big Tech to do it for you. If you want the goods on a conservative, just ask Big Tech; if you want the same info on a Liberal you better get a court order. Such is life in Biden’s America.
Second, somehow liberals argue the this right to privacy—banning the government from keeping comprehensive records on citizens—must also include a right to abortion. Abortion was never mentioned by the campaign advocating passage of the right to privacy back when it was passed in the 1970’s.
Anyway, it is impossible for peers within agencies to speak directly with each other unless you are a lifer in state service and happen to have a personal relationship with someone in another agency; usually because you were once coworkers. If my agency has a question for the State Controller, Franchise Tax Board, Dept of Motor Vehicles or any other state agency, we get to call the very same 800 number that you do as a member of the public to try and get help. Needless to say, with no way to prove who I am, where I’m calling from, or why my question is related to my duties as a state employee, I can get no help. Managers have no secret backdoor or liaison to go thru to get answers either.
This communication barrier extends to email also. Even when I have the email address of someone, say at the State Controller’s Office, my experience has been that email outside that agency is refused by the mail server and/or firewall. In the wake of Covid and other state activities, perhaps this will eventually change.
My point in bringing up communication is this, no one at my agency can get the State Controller to find a better way to send us accounting documents other than snail mail. The accounting documents that we get arrive in drips and drabs (when they make it to us). Each envelope or box has two copies of each accounting document. These must then be sorted and scanned into Adobe PDF files. These PDF files are then run thru an OCR program and then manually attached into our accounting system. My question, for the entire time that I have been a state employee (12 years), is why can’t we get this information in an electronic form?
The answer is simply that no one in my agency knows who to ask at SCO to make this happen or even if it’s possible.
Anyway, we typically get 2,500 – 3,000 unique pages, plus their duplicates, per month from the State Controller. When SCO screws-up employee payroll, then we get extras. Lots of extras. In January, we were supposed to get over 19,500 extra accounting documents (plus their duplicate copies). Snail mail being what it is, the documents did not arrive in a timely manner and yours truly was tasked with finding a way to get them from the old Unix system so we could manually attach them into our accounting system (not FI$Cal).
What follows is my account of how I solved the problem of getting these missing documents for my agency and making them usable.
SCO Prod
Via the Unix/mainframe computer program mentioned above, we can log in to the State Controller system with Read Only access. In theory, users can go other places too, but you need user and firewall permissions which I don’t have. Oh, I don’t have access to the State Controller site either. I have to use some else’s account.
You see, the Information Technology people where I work never interact with actual users, just each other. They have no clue what we need, they just take their best guess, filtered thru the lowest bidder, and deploy it to us. We are expected to like it, even if it is outdated or underpowered equipment when its brand new. Ditto for software. IT doesn’t care what our job is, just that each budget is spent and not exceeded. Thus, about five or six years ago, they went from giving access to almost everyone to taking it away. Apparently, the per seat costs were too high so they cut to the bone and beyond.
Anyway, using the log in information that I possess, I went hunting for the missing documents. We use a terminal program to access the mainframe. This software is capable of being configured to do bulk screen captures. Screen captures are the only way to get the missing documents. The only option is whether to print to a file or a physical printer.
Using my three hour per day window to be in the office, I went in on consecutive days to capture our missing documents. The missing documents were on two different dates. With a practical limit of about 80 screen captures per batch, I did over 35,000 screen captures during my six hours in the office. Each batch was saved as a PDF file. The PDF batches were then merged so all those on the same date were in a single file.
Filter Results
My next step was to sort the pages by agency, a four-digit number on each page. By doing research, I found that Adobe Acrobat had no built-in way to do this. I tried converting the file into Microsoft Word, an RTF file and a few others only to have Adobe Acrobat crash completely after the better part of an hour. After crashing Acrobat about a half dozen times, I gave up on any type of file conversion.
After more diligent research on the Internet, I found a different solution, a Java batch file run in Adobe Acrobat.
Here were the steps that I followed:
First, I had to see if it was possible and if so, how? This thread said it could be done.
Extract PDF Pages Based on Content
By clicking on the link found in the Correct Answer, I found some Java code to copy and paste. As is usual on the Internet, it was only part of a correct solution.
Extract PDF Pages Based on Content
If you look at the above two pages, you can figure out where to go in Acrobat DC to paste the code into the Action tab. However, the code is not correct.
The line:
var stringToSearchFor = “Total”;
is not the correct syntax. (Whatever is written inside the quotation marks is what the script searches for.)
This brings me to yet a third URL.
JavaScript String indexOf() Method
By looking at the below line of code, I found the missing thing needed above, parentheses.
var n = str.indexOf(“welcome”);
What the wonderful Java script mentioned above does is this; it looks for a string of text on the page and if it finds a match, it copies the page to a new file. Thus, I enter the agency number as the string to search for and then everything matching my agency is copied into a new file. As counter intuitive as it seems, I filter out what I want to keep and not the other way around.
The documents that were the result of the screen captures had only one problem, the font size was too small to be used in the macro process which I will describe below in a moment.
Increase Font in Acrobat Files
Once again, I found that Adobe Acrobat lacks a feature that I desperately need now. Acrobat has no ability to increase the font size of a PDF document. The reason for the need to increase font size is due to how the macro operates. The macro searches a location on the page for a document number. The margins on the page were too large and font too small. I felt that this would be a problem going forward.
Other than a few mentions about magnifying the size of a document on the page—which is not the same thing—I could find no solution. I came to the conclusion that improvisation was needed. I took the files created by the script above and then printed them to a new PDF file while increasing the magnification of the printed output to 115%.
The result was a page with a font size and page layout similar to the snail mail copies that we normally get. Once the font size was increased, I merged all files for the month into a single PDF.
Macro
Each month, I must take the existing PDF file for the month and run a series of macros on it. The end result of the macros is three parts:
- First, is a list of comments that is bulk uploaded into the accounting program.
- Second is that the merged file of all PDF documents is split into individual files that are labelled by document number.
- Third is a list of accounting documents that need to be manually uploaded one page at a time into the accounting program.
As part of the Macro process, each page of the PDF file is imported into an Excel workbook as a separate worksheet. This was the largest Excel workbook I every created with over 20,000 worksheets. This baby took lots of CPU power but unlike Acrobat, Excel didn’t crash under the strain.
Sadly, the next two steps of the macro failed to find the needed document number to continue.
After a review of a few worksheets, I noticed that the part of the monthly PDF file created by the process described above were all the same, which was good, but the rows were not where I needed them to be. There were blank rows at the top of the worksheets. I thought, what if I can remove the top row of all these at the same time?
In the dark recesses of my mind, I remembered that this was possible, but I forgot how. After a quick Internet search and creative use of the Shift key, I deleted the top row of one selected page and after an interminable wait, the first row was deleted from all the other selected worksheets as well.
I reran the failed macro steps and found that everything worked just like clockwork.
Elapsed time for the above was a week of my life.
Conclusion
Now 18 of us are manually attaching all the documents created above. Each of us has 1,080 (or more) PDF files that we are attaching one at a time to line items in the accounting program. This takes about 12 hours per person of uninterrupted time or about 216 manhours just for this month.
I’m sure it is possible to do this entire process in a matter of minutes if the people controlling the budget layer cared about a fiduciary responsibility for taxpayer money, but we work for the union not you so such waste in baked into the system.
Stay tuned for more tales of how your tax dollars are spent.