Amazon explains big AWS outage

Amazon explains big AWS outage (http://www.geekwire.com)

Technology

Amazon explains big AWS outage, says employee error took servers offline, promises changes.

Amazon has released an explanation of the events that caused the big outage of its Simple Storage Service Tuesday, also known as S3, crippling significant portions of the web for several hours.

RELATED: AWS cloud storage back online after outage knocks out popular sites

Amazon said the S3 team was working on an issue that was slowing down its billing system. Here’s what happened, according to Amazon, at 9:37 a.m. Pacific, starting the outage: “an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”

Those servers affected other S3 “subsystems,” one of which was responsible for all metadata and location information in the Northern Virginia data centers. Amazon had to restart these systems and complete safety checks, a process that took several hours. In the interim, it became impossible to complete network requests with these servers. Other AWS services that relied on S3 for storage were also affected.

About three hours after the issues began, parts of S3 started to function again. By about 1:50 p.m. Pacific, all S3 systems were back to normal. Amazon said it has not had to fully reboot these S3 systems for several years, and the program has grown extensively since then, causing the restart to take longer than expected.

Amazon said it is making changes as a result of this event, promising to speed up recovery time of S3 systems. The company also created new safeguards to ensure that teams don’t take too much server capacity offline when working on maintenance issues like the S3 billing system slowdown.

Amazon is also making changes to its service health dashboard, which is designed to track AWS issues. The outage knocked out the service health dashboard for several hours, and AWS had to distribute updates via its Twitter account and by programming in text at the top of the page. In the message, Amazon said it made a change to spread that site over multiple AWS regions.

Continue reading at http://www.geekwire.com

My Two Cents:
We were working with the ESRI ArcGIS Web Services API when it went down. I was not aware that ESRI leveraged the Amazon S3 Cloud systems. If you are going to run API Services, make sure you have redundancy. I was surprised. The old saying “do not put all your eggs in one basket” is obviously alive and well with some Tech corporations.

Dell Laptop Data Recovery

A few weeks ago I had a Dell inspiron laptop come in that had a corrupted and failing hard drive. It was a very good business friend’s wife’s laptop. The personal data on the laptop was irreplaceble, family pictures, movies, and personal finance. They had taken the laptop to Geek Squad and they couldn’t do anything for them except replace the hard drive, they called Dell and they told them it was a loss cause. So as a last resort they asked me if I would take a look at it. They had NO Backup!
Basically the drive was failing mechanically, it was still alive, but failed every test. I was surprised it was still functioning. Windows Vista 32 bit was installed, and it would not boot, go to anysafe mode, repair, or reinstall on the partition. The USB ports were not functioning, the keyboard was broken too. So I ordered a new hard drive and a new keyboard. When they arrived I replaced the keyboard and got to work of trying to get a usb drive operational to back up any data I could get off the partition. Nope that wouldn’t work ever. I did every basic back and recovery options available to no prevail. Right when I was going to give up and replace the drive, I saw that the dell had a recovery partition on the drive that was 600 Meg in size, just enough space that I could install an OS on it. The CD still worked! So I went for it, I was a little worried because the recovery system on the partition didn’t work. I got the OS installed (Windows Vista) and was able to access the damaged Partition. I ran a scan disk on the damaged Partition and it was successful. I attached the laptop to a backup device and was successful in getting all of the data off the laptop. Every Picture, Movie was fine! I then replaced the hard drive with a new larger and faster one, installed Windows 7 Utimate for them and gave them Backup Instructions!

Rule Number one: Backup Your Data.
Rule Number two: Make sure you have friends in Technology. We will work for a few beers after work for our friends!

Firesheep One of the Problems!

Using Wi-Fi? Firesheep may endanger your security (CNN)

Using  a public Wi-Fi? Firesheep is endangering your security. First…what is it?

Firesheep is an extension developed for the Firefox web browser. The extension uses a packet sniffer to intercept unencrypted cookies from certain websites (such as Twitter, Facebook, ebay, amazon) as the cookies are transmitted over networks, exploiting session hacking vulnerabilities. It shows the discovered identities on a sidebar displayed in the browser, and allows the user to instantly take on the log-in credentials of the user by double-clicking on the victim’s name.

So when you are using a public Wi-Fi network anyone else on the same network running firefox with the firesheep extension will intercept your private information and steal it. I have used firefox and the firesheep extension here in NYC, we have a Starbucks right across the street, and basically I will NEVER use a public WiFi PERIOD unless its an emergency….

The Blue Screen of Death in Windows

The Blue Screen of Death in Windows

Over the years we all have experienced the Blue Screen of Death if you are using a Windows Based operating system. Most of the time this start up error is based upon an error that is not that serious which has occurred in a program running on your system, to a very serious problem like a hard drive failure. So make sure you have backups!

The best practice when this Blue screen of death occurs is to have your windows installation media, like the CD, DVD or USB Thumb-drive to boot the system with. ( A lot of Computer professionals will start off with trouble shooting the system by pressing the F8 key on startup and using the advanced windows startup to different startups like safe mode and safe mode with command prompt and so on. This wastes time in the long run, because 60% of the time it will not work, period. ) Here is my simple way that works everytime unless the hard drive is hosed. I will use Windows XP for this example.

Insert the Windows media, boot from the media, when the windows installation starts, let it get to the options of what you want to do, install a fresh copy, or repair an existing installation. We select repair an existing installation by typing in R. The system will then go to the Repair system which is a lot like DOS. At the C:\ prompt type in CHKDSK / R which looks like this: C:\chkdsk /r
Click enter, the system will ask you are you sure you want to run this. Type in Y, and let it run. It will take anywhere from 1 hour to 10 minutes based upon your system. When it has completed 90% of the time you system will be fixed. The other 9% means that the Master Boot section was hosed. This is not a problem. What we do next is boot from your installation media again, select R for the repair an existing installation, and at the command prompt type in FIXBOOT, should look like this  C:\fixboot  The system will ask you are your sure, type in Y and click enter. After it finishes, type in EXIT and reboot. Your system will be back to normal. For the other 1% which this did not fix, you will need to install a fresh copy of your Operating System, and most of ther time this will not work because your hard drive has failed, or is failing. When I have a system that none of this works, 100% of the time the Hard Drive has failed. So I replace the drive with a new one, install my OS, and applications and I am good to go.