Click a term to initiate a search.
San Antonio is known for its TexMex and margaritas, yet something that many people overlook is our technology scene. This is quickly changing.
Mashable published an infographic by Acuity Group that shows San Antonio as one of the strongest markets for technology hiring in 2012. The Alamo City is turning into a tech hotbed; not only is Rackspace’s Corporate Headquarters located here, but San Antonio is also home to TechStars Cloud (a startup incubator) along with Geekdom (a collaborative workspace for Entrepreneurs, Technologists, Developers and Makers).
Better yet, Rackspace was recently ranked on Fortune’s Top 100 Companies to work for and number three on that list for job growth. If you are looking for a new job in technology, stop by the Racker Talent website to look at job postings and to find out more about Rackers and our culture. Additionally, if you have any questions about San Antonio, feel free to post them here.
While you can surely “Remember the Alamo!” don’t forget that San Antonio is one of the strongest markets for technology hiring in the coming year.
Looking Back
In 2011 we added a total of seven edge locations to Amazon CloudFront and Route 53. We also added lots of new features, as I documented last year.
Looking Forward
Our newest edge locations are located in Milan, Italy and Osaka, Japan. This brings our total worldwide location count to 26 (see the CloudFront page for a complete list). Each new edge location helps lower latency and improves performance for your end users.
Making Plans
We have additional locations in the pipeline for 2012 and beyond. Our planning process takes a number of factors in to account including notes from our sales team and discussions on the Amazon CloudFront forum. We also collect latency measurements from a number of points around the globe to our current set of locations and correlate them with broadband Internet penetration and existing Amazon CloudFront usage in the area.
I would also like to invite you to participate in the Amazon CloudFront Edge Location Survey. We are very interested in your suggestions for additional locations. We'd also like to learn a bit more about the type of content that you deliver to your customers.
All Aboard
The CloudFront team is hiring. We need some Software Development Engineers, a Senior Systems Engineer,a Senior Software Development Manager, a Product Manager, and a Business Development Representative.
-- Jeff;
The game industry today offers plenty of development engines to choose from. Unfortunately, there is no technology that allows all devices to plug into the same game server in real time. Almost 69 million Americans will be playing social games in 2012, according to analysis firm eMarketer. Within the Rackspace Startup Program there are several companies that cater to social games, the new buzz word in the industry, but it’s limited to device specific social play. To Digital Harmony Games, that’s not social at all. That’s just online play with social features, stuck in its own device.
“Games are rapidly moving into the mobile space, delivering content directly to the consumer. While multiplayer games are just hitting the surface, mobile games should be platform agnostic, allowing a player to connect to any mobile device despite the carrier, distributor, or the platform,” says Keren Kang, Co-founder and COO, on why she and Jeff Lujan, CEO and Co-founder, launched Digital Harmony Games as part of the Austin Technology Incubator’s Landing Pad Program. “We’re a developer of B2B real time, cross-platform connective technology ‘Harmony Tech’ with over 50 years of collective experience in the industry and the social games to exploit that technology.”
“The mobile, tablet and browser space (gaming or not) is headed in the cross-platform connective era. Today this is available in a limited, turn-based capacity, reliant on a user waiting for another user to respond. In this way, the turn-based connectivity restricts socializing to a ‘wait-for-a-reply’ type of experience rather than real time voice or keyboard chatting. To us, that’s not social. That’s just a text message,” explains Keren. “We are developing a middleware network solution that enables any and all developers the ability to connect all smart phone, tablet, and browser platforms into the same environment.”
Digital Harmony Games chose to use Rackspace Managed Cloud to build its infrastructure. “Jeff and I have been using Rackspace since our last company, which developed massively multiplayer online role-playing games (MMORPG’s). The service, location, and cost are the best around. The pros: location, customer service, Cloud capabilities, exchange server, uptime, CDN Connect-ability. The cons: email spam, but that gets filtered in the quarantine,” says Kang. “We’re complete nerds that want to make cool things. Rackspace Managed Cloud allows us to focus on our passions. We’re developing two games simultaneously, have become an official partner with leading US distributer TapJoy, in negotiations for publishing our games with several top tier publishers, and looking to raise our full series A funding.”
Digital Harmony’s patent pending ‘Harmony Tech’ paired with their cross-platform designed games, is set to revolutionize the social games space while adding value to all industries that utilize mobile, browser and tablet devices. Congratulations to Digital Harmony Games and its pursuit to revolutionize the social games space.
The Rackspace Startup Program strives to add value to a startups dream by offering Managed Cloud to allow entrepreneurs to focus on development of the product and let Rackspace manage their Cloud. Contact the Space Cowboys today to find out how Rackspace Managed Cloud can help make your startup dream a reality.
Recently, the General Services Administration (GSA) solicited bids to take care of their massive email system. They embarked on a process many businesses are familiar with – finding the right fit of features, security, and cost to support their users. The GSA selected hosted email to serve remote offices, streamline administration, and reduce future investments. Though you may not have the same concerns as a government agency, choosing the right business email system to accommodate users, fit the budget, and scale with growth is critical. We’ll help you narrow it down to the most important criteria consider when choosing an email system.
Industry Standards: Because a coffee shop isn’t bound by the same industry or government regulation as a law firm, industry standards are a major factor in your choice. This drives security and deployment options as HIPAA, FINRA, and other industry-specific regulations directly address email administration and availability.
Support Resources: Tomorrow you find that all of the invoice emails are bouncing, what would you do? If your IT staff lacks the experience or bandwidth to manage the complexities of an email system, like spam/virus control, storage, and troubleshooting, you could be down for days. With some hosted or free email options, support limitations may hamper your ability to quickly resolve issues.
Financial Investment: With competing business priorities, sinking thousands into hardware and software isn’t always practical. As attractive as an on-site email deployment with dedicated staff looks, typically, the cost outweighs the benefit when dollars could be better spent on marketing or product development to grow the business.
Business Use: An outage in an office that uses email infrequently has a different impact than in an office relying on system-generated emails from a website or accounting system to process orders. Understand the relationship between email and other systems to determine how critical email is to business operations.
Office Structure: If your business utilizes remote employees or employees that move between locations, your email needs to accommodate access. How messages are synced to other devices and locations governs how quickly users can access messages and respond to information.
Mobile Use: Even if you don’t officially support employees’ smartphones and tablets, that doesn’t mean they aren’t using them as a convenient way to stay in touch with the office. Know the difference between offering access to a mobile browser to check email and having a synced mobile email app with calendar, contact list, and folder access.
Once you’ve reviewed these criteria, you’ll be ready to decide between the convenience of a hosted email solution, an investment in an on-site solution, or a hybrid of the two. If you decide, like the GSA, to host your email, check out our planning tools to help you make the switch or contact us to discuss your options.
This is a guest post written and contributed by Alexander Negrash, Marketing Manager at CloudBerry Lab, a Rackspace Cloud Tools Partner. CloudBerry Lab provides file manager and backup utility in order to help Rackspace users leverage the cloud storage (Cloud Files).
CloudBerry Explorer freeware is an FTP-like file manager that allows you to connect to any number of Rackspace accounts directly and manage containers and files. With CloudBerry Explorer, Rackspace Cloud Files become an extension to your local storage. You are no longer limited to the classic data storage on your local drive(s). With CloudBerry Explorer, you can move files to Cloud Files just as easily as managing them on your local drive(s). You can browse, create, and delete files as well as synchronize folders on your PC and cloud storage and more.
We are constantly developing the program and we will add features such as URLs generation (file sharing) and Capacity Reports (storage space tracker) . Besides we will also add object metadata and ACL editor.
For those who don’t need advanced file management with CloudBerry Explorer freeware, we are about to release CloudBerry Drive. This application is designed to expose Cloud Files storage as a local disk. With such approach files on the cloud can be managed the way as if they were stored locally. Sign up to get the first release of CloudBerry Drive.
Another way to leverage Cloud Files is to use it as online backup storage . We are also about to update our online backup product to support Rackspace. CloudBerry Backup comes with a solid set of features such as real-time backup and block level updates and helps automated data backup to Rackspace Cloud Files. CloudBerry Backup allows having a full control over the backup process. In addition to that backup data can be encrypted and compressed before sending to the cloud. Combination of a standalone program and Cloud Files storage will provide the true security to Rackspace users.
So there are two ways Rackspace users can leverage Cloud Files with CloudBerry Lab tools: (1) file management with CloudBerry Explorer and CloudBerry Drive (coming soon) and (2) automated online backup to Cloud Files with CloudBerry Backup (Cloud Files support will also arrive shortly).
Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.
-- Jeff;
We’re always excited when we can bring features to our customers that make it easier for them to derive value from their data—so it’s been a fun month for the EMR team. Here is a sampling of the things we’ve been working on.
Free CloudWatch Metrics
Starting today customers can view graphs of 23 job flow metrics within the EMR Console by selecting the Monitoring tab in the Job Flow Details page. These metrics are pushed CloudWatch every five minutes at no cost to you and include information on:
Please watch this video to see how to view CloudWatch graphs in the EMR Console:
You can also learn more from the Viewing CloudWatch Metrics section of the EMR Developer Guide.
You can view the new metrics in the AWS Management Console:
Further, through the CloudWatch Console, API, or SDK you can set alarms to be notified via SNS if any of these metrics go outside of specified thresholds. For example, you can receive an email notification whenever a job flow is idle for more than 30 minutes, HDFS Utilization goes above 80%, or there are five times as many remaining map tasks as there are map slots, indicating that you may want to expand your cluster size.
Please watch this video to see how to set EMR alarms through the CloudWatch Console:
Hadoop 0.20.205, Pig 0.9.1, and AMI Versioning
EMR now supports running your job flows using Hadoop 0.20.205 and Pig 0.9.1. To simplify the upgrade process, we have also introduced the concept of AMI versions. You can now provide a specific AMI version to use at job flow launch or specify that you would like to use our “latest” AMI, ensuring that you are always using our most up-to-date features. The following AMI versions are now available:
You can specify an AMI version when launching a job flow in the Ruby CLI using the --ami-version argument (note that you will have to download the latest version of the Ruby CLI):
$ ./elastic-mapreduce --create --alive --name "Test AMI Versioning" --ami-version latest --num-instances 5 --instance-type m1.smallPlease visit the AMI Versioning section of the Elastic MapReduce Developer Guide for more information.
S3DistCp for Efficient Copy between S3 and HDFS
We have also made available S3DistCp, an extension of the open source Apache DistCp tool for distributed data copy, that has been optimized to work with Amazon S3. Using S3DistCp, you can efficiently copy large amounts of data between Amazon S3 and HDFS on your Amazon EMR job flow or copy files between Amazon S3 buckets. During data copy you can also optimize your files for Hadoop processing. This includes modifying compression schemes, concatenating small files, and creating partitions.
For example, you can load Amazon CloudFront logs from S3 into HDFS for processing while simultaneously modifying the compression format from Gzip (the Amazon CloudFront default) to LZO and combining all the logs for a given hour into a single file. As Hadoop jobs are more efficient processing a few, large, LZO-compressed files than processing many, small, Gzip-compressed files, this can improve performance significantly.
Please see Distributed Copy Using S3DistCp in the Amazon Elastic MapReduce documentation for more details and code examples.
cc2.8xlarge Support
Amazon Elastic MapReduce also now supports the new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge). Like other Cluster Compute instances, cc2.8xlarge instances are optimized for high performance computing, giving customers very high CPU capabilities and the ability to launch instances within a high bandwidth, low latency, full bisection bandwidth network. cc2.8xlarge instances provide customers with more than 2.5 times the CPU performance of the first Cluster Compute instance (cc1.4xlarge) instance, more memory, and more local storage at a very compelling cost. Please visit the Instance Types section of the Amazon Elastic MapReduce detail page for more details.
In addition, we are pleased to announce an 18% reduction in Amazon Elastic MapReduce pricing for cc1.4xlarge instances, dropping the total per hour cost to $1.57. Please visit the Amazon Elastic MapReduce Pricing Page for more details.
VPC Support
Finally, we are excited to announce support for running job flows in an Amazon Virtual Private Cloud (Amazon VPC), making it easier for customers to:
You can launch Amazon Elastic MapReduce job flows into your VPC through the Ruby CLI by using the --subnet argument and specifying the subnet address (note that you will have to download the latest version of the Ruby CLI):
$ ./elastic-mapreduce --create --alive --subnet "subnet-identifier"Please visit the Running Job Flows on an Amazon VPC section in the Elastic MapReduce Developer Guide for more information.
-- Adam Gray, Product Manager, Amazon Elastic MapReduce.
CloudU Notebooks is a weekly blog series that explores topics from the CloudU certificate program in bite sized chunks, written by me, Ben Kepes, curator of CloudU. How-to’s, interviews with industry giants, and the occasional opinion piece are what you can expect to find. If that’s your cup of tea, you can subscribe here.
The interesting thing about disruption is that when it occur there are both opportunities and threats – opportunities for those who are prepared to stand on he edge of whatever the disruption is, and threats for those unable or unwilling to adapt. As I posted the other day, Cloud Computing is fundamentally changing the shape of the IT job. It’s undeniably disruptive to the IT industry, but this disruption extends beyond simply a threat/opportunity vector for IT, it effects more general roles as well.
Wanted Analytics, a real time business intelligence company, recently posted some statistics round the hiring for cloud computing skills. They found that recruiters, over a 90 day period, posted over 10000 online job ads that included a requirement for cloud computing skills – that’s 61% more than he same period a year ago. The graph below shows this growth.
The interesting thing here though is that these aren’t all in technical roles – there’s bunch of marketing and sales roles, along with customer service and even cargo and freight agent positions that are all demanding a competency in cloud. It’s kind of analogous to typewriters. It used to be that only people who were looking for jobs in a typing pool needed to know how to type. Nowadays pretty much everyone needs some typing skills as a core competency for their role – so to with cloud skills. That’s our reason for creating the CloudU program, and even more so the CloudU certificate – they’re an attempt to give people an entry level introduction to Cloud Computing, something to whet their appetite and to give them a grounding.
Life is about ongoing skill-building. I’m a big fan of lifelong learning and the great thing about the disruption coming from cloud computing is that it means there is a real impetus to build skills in this particular area, and those skills in turn will make people more valuable to current and prospective employers. Feedback from course participants (and we’ve now had nearly 1500 people sign up for the certificate and going on 300 graduate) is that what we’re doing is on the right track. As recent graduate Melissa Huebener says;
Cloud is the future of technology. This Certification serves a springboard for continuing education in this area. It supplies a wonderful indication to employers that I am willing to learn, change and grow in my career as technology advances forward.
We’re also getting great feedback from other educators. Steve Mallard from the Tennessee Technology Center wrote an unsolicited email thanking us for the program and said that;
CloudU is an excellent resource for anyone wanting to learn about Cloud Computing. As an instructor of information technology, the certificate provides a great learning tool for the planning, deployment and logistics behind cloud computing.
And without wanting to blow our own trumpet too much, someone also pointed out to us recently that we’ve been named one of the top 10 Cloud certifications in the industry – that’s pretty awesome praise!
Top 10 Cloud Computing Certifications View more presentations from Glen RobertsCloudU is an exciting development and one that I’m really proud to be involved with – we’d love to have you join in the discussion!
As of the end of 2011, there are 762 billion (762,000,000,000) objects in Amazon S3. We process over 500,000 requests per second for these objects at peak times.
Here's the annual growth chart:
This represents year-over-year growth of 192%; S3 grew faster last year than it did in any year since it launched in 2006.
Where are all of these objects coming from? Although we definitely made it easier for you to delete objects using Multi-Object Deletion and Object Expiration, we also gave you plenty of ways to upload new objects using Multipart upload, AWS Direct Connect, and AWS Import/Export.
As you can imagine, building, running, and adding new features to a system as large and as complex as S3 is no simple task. Here are some of the open positions on the S3 team:
-- Jeff;
We have added two new benefits to the Gold and Platinum levels of AWS Premium Support. The following features are now in beta testing:
Third-Party Support
If you have Gold or Platinum Premium Support, you can now ask questions related to a number of popular operating systems including Microsoft Windows, Ubuntu, Red Hat Linux, SuSE Linux, and the Amazon Linux AMI. You can ask us about system software including the Apache and IIS web servers, the Amazon SDKs, Sendmail, Postfix, and FTP. A team of AWS support engineers is ready to help with setup, configuration, and troubleshooting of these important infrastructure components.
We are also enabling the use of desktop sharing software, giving you the option to share your desktop with a support engineer as needed.
AWS Trusted Advisor
AWS Trusted Advisor draws upon best practices learned from AWS’ aggregated operational history of serving hundreds of thousands of AWS customers. The AWS Trusted Advisor inspects your AWS environment and makes recommendations when opportunities exist to save money, improve system performance, or close security gaps. The initial release of the AWS Trusted Advisor includes eight separate checks; we'll be adding more throughout 2012.
The checks are grouped into three families: fault tolerance checks, security audits, and cost optimizations. Here is the initial set of eight checks performed by AWS Trusted Advisor:
AWS Trusted Advisor does not have access to customer data. Recommendations are made by analyzing information gathered using a constrained set of internal and documented AWS API calls.
Here's a diagram to show you how it works:
Advice from the AWS Trusted Advisor is made available in several different forms. For certain issues, we will proactively create support cases and notify you that a given check has identified an opportunity for improvement. The AWS Support Engineers are also available to review AWS Trusted Advisor recommendations any time you call in for support. In the future a regular scorecard report will be available, as will an AWS Trusted Advisor Console with support for viewing, running, customizing, and even opting out of certain checks as desired.
These new features are available for all Gold and Platinum customers. What do you think? Leave a comment and let me know.
-- Jeff;
You can now add up to 10 tags to any of your Auto Scaling Groups. You can also, if you'd like, propagate the tags to the EC2 instances launched from your groups.
Adding tags to your Auto Scaling groups will make it easier for you to identify and distinguish them.
Each tag has a name, a value, and an optional propagation flag. If the flag is set, then the corresponding tag will be applied to EC2 instances launched from the group. You can use this feature to label or distinguish instances created by distinct Auto Scaling groups. You might be using multiple groups to support multiple scalable applications, or multiple scalable tiers or components of a single application. Either, way the tags can help you to keep your instances straight.
Read more in the newest version of the Auto Scaling Developer Guide.
-- Jeff;
Typically, system breaches that make news are massive attacks netting volumes of data from big name companies. However, research suggests that attacks on small-to-medium sized businesses are rising. Why? Because smaller organizations are ripe with security holes that make it easier for the bad guys to get in and wreak havoc on system operations, steal client lists, even access sensitive data like credit card and social security numbers. Not in the cloud. In the cloud, you’re better protected from malicious threats with multi-layered protection and IT security specialists whose only focus is maintaining and monitoring the fidelity of systems. The Cloud Avengers are all too familiar with the scourge of hackers. Watch as they bring a strong line of defense to shut down hackers before they infiltrate the system and let our IT pro get back to lunch.
Click image below for larger version.
Stay tuned tomorrow to find out what Cloud Avengers do to sinister service contracts. Check out more adventures from Cloud Avengers below:
Cloud Avengers Save the Day with Cloud Files
Cloud Avengers Knock Out System Crashes
Cloud Avengers Annihilate Software Bugs
Cloud Avengers Free the Server Room
Website Traffic Cleared by Cloud Avengers
Cloud Avengers Think Small for Data Storage
Embed This Graphic – Copy Source Code Below:
<a href="http://www.rackspace.com/cloud/"> Cloud Computing Services <img src="http://c179631.r31.cf0.rackcdn.com/Cloud_Comic_5_Updated.jpg" alt="Cloud Computing Services" /> </a>Many times we know a small business in need of recommendations and business help. Business owners almost always ask, “Am I the most efficient and profitable company I can be?” Perhaps they are a one or two employee shop and often wonder, “What tools can I use to help determine these cost-saving opportunities?” As businesses grow and mature, these are often questions we see in the Rackspace Startup Program and we love sharing how one startup can truly help another.
If you own a small business and are looking to grow, Profitably may be the startup with just the solution for you! Founded in 2010 by Adam Neary, Francis Hwang, and Chad Pugh, Profitably now boats 6 employees and has been able to raise both seed and Series A funding while calling General Assembly in NYC home.
Think of a web application that is able to read your QuickBooks data, analyze that information and then make business recommendations for improvement. The application provides prioritized next steps and recommendations, all while offering the small business owner the real time analytics to implement that assessment and put those recommendations into action.
The beauty of Profitably is the manner in which this 18-month old company boasts the tools necessary to help your business decrease expenditures and increase profits. Francis Hwang, CTO, attributes it to Rackspace Cloud hosting and the staging and production environments on which Profitably is built. Hwang shares, “Rackspace helps us iterate quickly and deliver customer value more quickly, so it’s great!”
Think of the possibilities, as a small business, are you switching vendors or suppliers? Perhaps you are negotiating new contracts and need to recognize areas of excess. Profitably can make your small business the most successful it’s ever been. And it when it comes to success, there is no better place for startups to begin than with the Rackspace Startup Program.
Partnering with Rackspace brings startups and small businesses the Cloud Computing resources and service needed to grow. Want to find out how your startup can become part of the Rackspace Startup Program? Contact us today.
This is a guest post written and contributed by Fred van den Bosch at Librato, Inc., a Rackspace Cloud Tools Partner. Librato is the creator of Metrics, a time series data platform that provides uniform monitoring and alerting for your operation.
Would you consider building your own Database Management System? Probably not. While talking with our Silverline customers, the majority of which are SaaS or PaaS providers, we concluded that the same should apply for the management of time-series data. This is what we heard over and over:
● They want to “measure everything” in their operation that can alert them of unusual events and help them find and fix the root cause of problems. Many of the use a continuous deployment methodology, which makes it even more important for them to detect and fix problems as early as possible
● In order to measure everything, they need to deploy multiple tools, each with their own repository, user interface, event handling, and installation requirements. On top of that, they need to monitor metrics that are specific to their environment and need to build custom tools for collecting, storing, visualizing and acting on that data. All of this adds to their workload, but –more importantly – makes it difficult to correlate data from different sources and do root cause analysis.
A time-series data management platform would allow the developers of monitoring tools to use their expertise to monitor all important data, while providing DevOps and operations teams with a uniform environment for storage, visualization, correlation and alerting.
That’s why we built Metrics, a time series data management platform, delivered as a service, and built from the ground up with an “API first” approach. That’s distinctly different from monitoring services that provide you with a “canned” solution for monitoring a specific set of metrics, sometimes with the option to add some custom metrics.
The Metrics platform uncouples the metrics collection from the storage, analysis and alerting, giving DevOps and operations staff the freedom to choose any combination of open source, commercial or custom collection tools and allowing application developers to instrument their applications in the most optimal fashion.
Metrics also allows you to programmatically create user accounts and transparently add users to the Metrics platform. This means that if you’re an IaaS, SaaS or PaaS provider and want to provide your users with the ability to monitor their use of your service, you can use Metrics to do so by integrating it at whatever level best fits your needs: data repository, instruments for your own dashboards, or complete dashboards.
To further ease the adoption of Metrics, we’re building a community and ecosystem with tools and applications that customers can use to monitor and manage their cloud and data center infrastructures. In other (non IT) markets we’ll work with partners who use Metrics as a platform for solutions they provide to their customers.
If this sounds appealing, give Metrics a try; we have an unlimited, 30 day free trial.
You’re watching the big game; your favorite player shoots the ball, and…nothing. That’s what happened to University of Connecticut women’s basketball fans last year when Connecticut Public Broadcasting Network’s (CPBN) servers buckled under unexpectedly heavy traffic while broadcasting an intense, live game online. Because alumni and fans depend on the feed to watch games from all over the world, CPBN quickly contacted Rackspace to put technology in place to prevent a repeat of that experience. Rackspace combined Cloud Servers and Cloud Load Balancers with Scalr’s auto-scaling platform and CapCal’s Web Performance Testing from Grid Robotics to provide instant scaling for peak periods and performance testing to better understand and plan traffic.
Why Rackspace?
“With ten games under our belt so far this season, we are providing superior service to our viewers with sub-second response times. We anticipate at most a one-second response time even during peak loads for most viewers.”
-Derrick Ellis, Director of Online and New Media at Connecticut Public Broadcasting
Read the entire Connecticut Public Broadcasting case study now.
Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.
-- Jeff;
Apache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. That’s why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMR’s highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.
In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.
We will also use sample product order data stored in S3 to demonstrate how you can keep current data in DynamoDB while storing older, less frequently accessed data, in S3. By exporting your rarely used data to Amazon S3 you can reduce your storage costs while preserving low latency access required for high velocity data. Further, exported data in S3 is still directly queryable via EMR (and you can even join your exported tables with current DynamoDB tables).
The sample order data uses the schema below. This includes Order ID as its primary key, a Customer ID field, an Order Date stored as the number of seconds since epoch, and Total representing the total amount spent by the customer on that order. The data also has folder-based partitioning by both year and month, and you’ll see why in a bit.
Creating a DynamoDB Table
Let’s create a DynamoDB table for the month of January, 2012 named Orders-2012-01. We will specify Order ID as the Primary Key. By using a table for each month, it is much easier to export data and delete tables over time when they no longer require low latency access.
For this sample, a read capacity and a write capacity of 100 units should be more than sufficient. When setting these values you should keep in mind that the larger the EMR cluster the more capacity it will be able to take advantage of. Further, you will be sharing this capacity with any other applications utilizing your DynamoDB table.”
Launching an EMR Cluster
Please follow Steps 1-3 in the EMR for DynamoDB section of the Elastic MapReduce Developer Guide to launch an interactive EMR cluster and SSH to its Master Node to begin submitting SQL-based queries. Note that we recommend you use at least three instances of m1.large size for this sample.
At the hadoop command prompt for the current master node, type hive. You should see a hive prompt: hive>
As no other applications will be using our DynamoDB table, let’s tell EMR to use 100% of the available read throughput (by default it will use 50%). Note that this can adversely affect the performance of other applications simultaneously using your DynamoDB table and should be set cautiously.
SET dynamodb.throughput.read.percent=1.0;Creating Hive Tables
Outside data sources are referenced in your Hive cluster by creating an EXTERNAL TABLE. First let’s create an EXTERNAL TABLE for the exported order data in S3. Note that this simply creates a reference to the data, no data is yet moved.
You can see that we specified the data location, the ordered data fields, and the folder-based partitioning scheme.
Now let’s create an EXTERNAL TABLE for our DynamoDB table.
CREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_id string, customer_id string, order_date bigint, total double )This is a bit more complex. We need to specify the DynamoDB table name, the DynamoDB storage handler, the ordered fields, and a mapping between the EXTERNAL TABLE fields (which can’t include spaces) and the actual DynamoDB fields.
Now we’re ready to start moving some data!
Importing Data into DynamoDB
In order to access the data in our S3 EXTERNAL TABLE, we first need to specify which partitions we want in our working set via the ADD PARTITION command. Let’s start with the data for January 2012.
Now if we query our S3 EXTERNAL TABLE, only this partition will be included in the results. Let’s load all of the January 2012 order data into our external DynamoDB Table. Note that this may take several minutes.
INSERT OVERWRITE TABLE orders_ddb_2012_01Looks a lot like standard SQL, doesn’t it?
Querying Data in DynamoDB Using SQL
Now let’s find the top 5 customers by spend over the first week of January. Note the use of unix-timestamp as order_date is stored as the number of seconds since epoch.
Querying Exported Data in S3
It looks like customer: ‘c-2cC5fF1bB’ was the biggest spender for that week. Now let’s query our historical data in S3 to see what that customer spent in each of the final 6 months of 2011. Though first we will have to include the additional data into our working set. The RECOVER PARTITIONS command makes it easy to
We will now query the 2011 exported data for customer ‘c-2cC5fF1bB’ from S3. Note that the partition fields, both month and year, can be used in your Hive query.
SELECT year, month, customer_id, sum(total) spend, count(*) order_countExporting Data to S3
Now let’s export the January 2012 DynamoDB table data to a different S3 bucket owned by you (denoted by YOUR BUCKET in the command). We’ll first need to create an EXTERNAL TABLE for that S3 bucket. Note that we again partition the data by year and month.
Now export the data from DynamoDB to S3, specifying the appropriate partition values for that table’s month and year.
INSERT OVERWRITE TABLE orders_s3_new_exportNote that if this was the end of a month and you no longer needed low latency access to that table’s data, you could also delete the table in DynamoDB. You may also now want to terminate your job flow from the EMR console to ensure you do not continue being charged.
That’s it for now. Please visit our documentation for more examples, including how to specify the format and compression scheme for your exported files.
-- Adam Gray, Product Manager, Amazon Elastic MapReduce.
The increase in user-generated content, regulatory requirements, digital entertainment, social media, and mobile usage demands more storage resources to manage growing volumes of data. The hardware and systems needed to stay ahead of that growth can be confusing and costly. Taking advantage of the elastic nature and utility pricing model of cloud computing helps businesses drive down storage costs and avoid huge investments. Today, Cloud Avengers save a business drowning under a deluge of data by utilizing cloud storage.
Click image below for larger version.
Next time, Cloud Avengers take on malicious hackers. Check out more adventures from Cloud Avengers below:
Cloud Avengers Save the Day with Cloud Files
Cloud Avengers Knock Out System Crashes
Cloud Avengers Annihilate Software Bugs
Cloud Avengers Free the Server Room
Website Traffic Cleared by Cloud Avengers
Embed This Graphic – Copy Source Code Below:
<a href="http://www.rackspace.com/cloud/"> Cloud Computing Services <img src="http://c179631.r31.cf0.rackcdn.com/Cloud_Comic_8.jpg" alt="Cloud Computing Services" /> </a>The coming year will likely see unprecedented growth in cloud computing, especially as people become more comfortable with the interaction of the interfaces and security becomes more stable.
One of the things that drive people to the cloud is scalability. The ability to scale up or down at a moment’s notice is one of the cloud’s greatest benefits. Scaling is vital for anyone who expects usage to increase – say a major product launch that will bring thousands of new visitors to a site. Scalability in the cloud allows users to expand or contract when they need to.
Vertical ScalingVertical scaling is the easiest. Essentially it resizes your server with click of the button and no change to code. The downside is that vertical scaling is limited by the fact that you can only get as big as the size of one server. If that larger size server still can’t absorb the traffic that is hitting your site, you are stuck.
I like to think of vertical scaling like a blowfish and web traffic like an aquatic predator such as a shark. If the predator is a smaller shark, the blowfish could inflate and scare the little shark away. However, if the shark is the size of a Great White, there is no amount of puffing up that the blowfish can do to prevent the shark from overtaking him.
Horizontal ScalingHorizontal scaling affords the ability to scale “wider” to deal with the traffic. Essentially you could run your application on multiple servers, with the ability to add more servers that can help handle the traffic.
Keeping with the aquatic analogy, horizontal scaling would be like a little fish, such as a sardine, coming together to form a school of fish that can synchronize their movements to thwart the shark. If a shark were to attack this group, he might get a couple of them but there is no way for him to get all of the fish.
When customers hear of the differences between vertical and horizontal scaling, most of them reply, “I want to scale horizontally for sure. I want that big cluster of servers to handle the traffic.”
However, you must note that horizontal scaling can’t be done on a moment’s notice. Horizontal scaling requires some advance planning – you have to prep for it. You have to make sure that all your code is in line, that your resources are available and that your architecture can handle this type of scale. You essentially have to break down your code into different functions that scale on their own, such as webheads and databases.
Our Managed Cloud Rackers can be a strong partner to help you code to scale. While we support many of the core technologies when it comes to web hosting, our level of expertise does not extend into customized applications.
To ensure that you are coding your application to scale instead of coding to fail, you should consider the following four points:
1) Have a load balancer in front your configurationThat is the most important, easiest and beneficial thing a customer can do. Even if you are going to host a single server, host it behind a load balancer. The reason is that the support team can move the server around or add a server without ever having to change the DNS.
2) Split out different layers of your configurationMake sure that you have the different layers of your configuration split apart so that you can “turn up” what is needed. Think of an equalizer on a stereo – it gives you the ability to turn up the treble or tone down the bass independently of each other. You want to have this same ability on your config, to be able to increase the webheads or the database independently of each other.
3) Develop for horizontal scaling from the beginningAs you develop your application or software, have the conversation with your development team to let them know that you want to scale horizontally. An ounce of prevention is worth a pound of cure, and your application will be ready to go if you plan to scale horizontally from the onset.
4) Give your Rackspace Managed Cloud team a callWhile not experts on your particular application, we are experts in most of the web technologies and infrastructure and can help provide insight on what has worked in the past. If we can understand the function of your site, we can understand what type of traffic you might encounter and can make suggestions on how to handle it.
If you consider these points as you begin creating an application for the cloud, you will be able to code to scale instead of coding to fail.
Joseph Palumbo leads the Managed Cloud Account Managers team at Rackspace. Read his previous post Using the Cloud to Troubleshoot the Cloud for more Managed Cloud information.
Warning: If you don't have a data center, or if all of your IT infrastructure is already in the cloud, you may not need to read this post! But feel free to pass it along to your friends and colleagues.
The Storage Gateway
Our new AWS Storage Gateway service connects an on-premise software appliance with cloud-based storage to integrate your existing on-premises applications with the AWS storage infrastructure in a seamless, secure, and transparent fashion. Watch this video for an introduction:
Data stored in your current data center can be backed up to Amazon S3, where it is stored as Amazon EBS snapshots. Once there, you will benefit from S3's low cost and intrinsic redundancy. In the event you need to retrieve a backup of your data, you can easily restore these snapshots locally to your on-premises hardware. You can also access them as Amazon EBS volumes, enabling you to easily mirror data between your on-premises and Amazon EC2-based applications.
You can install the AWS Storage Gateway's software appliance on a host machine in your data center. Here's how all of the pieces fit together:
The AWS Storage Gateway allows you to create storage volumes and attach these volumes as iSCSI devices to your on-premises application servers. The volumes can be Gateway-Stored (right now) or Gateway-Cached (soon) volumes. Gateway-Stored volumes retain a complete copy of the volume on the local storage attached to the on-premises host, while uploading backup snapshots to Amazon S3. This provides low-latency access to your entire data set while providing durable off-site backups. Gateway-Cached volumes will use the local storage as a cache for frequently-accessed data; the definitive copy of the data will live in the cloud. This will allow you to offload your storage to Amazon S3 while preserving low-latency access to your active data.
Gateways can connect to AWS directly or through a local proxy. You can connect through AWS Direct Connect if you would like, and you can also control the amount of inbound and outbound bandwidth consumed by each gateway. All data is compressed prior to upload.
Each gateway can support up to 12 volumes and a total of 12 TB of storage. You can have multiple gateways per account and you can choose to store data in our US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia Pacific (Singapore), or Asia Pacific (Tokyo) Regions.
The first release of the AWS Storage Gateway takes the form of a VM image for VMware ESXi 4.1 (we plan on supporting other virtual environments in the future). Adequate local disk storage, either Direct Attached or SAN (Storage Area Network), is needed for your application storage (used by your iSCSI storage volumes) and working storage (data queued up for writing to AWS). We currently support mounting of our iSCSI storage volumes using the Microsoft Windows and Red Hat iSCSI Initiators.
Up and Running
During the installation and configuration process you will be able to create up to 12 iSCSI storage volumes per gateway. Once installed, each gateway will automatically download, install, and deploy updates and patches. This activity takes place during a maintenance window that you can set on a per-gateway basis.
The AWS Management Console includes complete support for the AWS Storage Gateway. You can create volumes, create and restore snapshots, and establish a schedule for snapshots. Snapshots can be scheduled at 1, 2, 4, 8, 12, or 24 hour intervals. Each gateway reports a number of metrics to Amazon CloudWatch for monitoring.
The snapshots are stored as Amazon EBS (Elastic Block Store) snapshots. You can create an EBS volume using a snapshot of one of your local gateway volumes, or vice versa. Does this give you any interesting ideas?
The Gateway in Action
I expect the AWS Storage Gateway will be put to use in all sorts of ways. Some that come to mind are:
Security Considerations
We believe that the AWS Storage Gateway will be at home in the enterprise, so I'll cover the inevitable security questions up front. Here are the facts:
Costs
All AWS users are eligible for a free trial of the AWS Storage Gateway. After that, there is a charge of $125 per month for each activated gateway. The usual EBS snapshot storage rates apply ($0.14 per Gigabyte-month in the US-East Region), as do the usual AWS prices for outbound data transfer (there's no charge for inbound data transfer). More pricing information can be found on the Storage Gateway Home Page. If you are eligible for the AWS Free Usage Tier, you get up to 1 GB of free EBS snapshot storage per month as well as 15 GB of outbound data transfer.
On the Horizon
As I mentioned earlier, the first release of the AWS Storage Gateway supports Gateway-Stored volumes. We plan to add support for Gateway-Cached volumes in the coming months.
We'll add more features to our roadmap as soon as our users (this means you) start to use the AWS Storage Gateway and send feedback our way.
Learn More
You can visit the Storage Gateway Home Page or read the Storage Gateway User Guide to learn more.
We will be hosting a Storage Gateway webinar on Thursday, February 23rd. Please attend if you would like to learn more about the Storage Gateway and how it can be used for backup, disaster recover, and data mirroring scenarios. The webinar is free and open to all, but space is limited and you need to register!
-- Jeff;
You can now launch Amazon Relational Database Service (RDS) DB instances inside of a Virtual Private Cloud (VPC).
Some Background
The Relational Database Service takes care of all of the messiness associated with running a relational database. You don't have to worry about finding and configuring hardware, installing an operating system or a database engine, setting up backups, arranging for fault detection and failover, or scaling compute or storage as your needs change.
The Virtual Private Cloud lets you create a private, isolated section of the AWS Cloud. You have complete control over IP address ranges, subnetting, routing tables, and network gateways to your own data center and to the Internet.
Here We Go
Before you launch an RDS DB Instance inside of a VPC, you must first create the VPC and partition its IP address range in to the desired subnets. You can do this using the VPC wizard pictured above, the VPC command line tools, or the VPC APIs.
Then you need to create a DB Subnet Group. The Subnet Group should have at least one subnet in each Availability Zone of the target Region; it identifies the subnets (and the corresponding IP address ranges) where you would like to be able to run DB Instances within the VPC. This will allow a Multi-AZ deployment of RDS to create a new standby in another Availability Zone should the need arise. You need to do this even for Single-AZ deployments, just in case you want to convert them to Multi-AZ at some point.
You can create a DB Security Group, or you can use the default. The DB Security Group gives you control over access to your DB Instances; you can allow access from EC2 instances with specific EC2 Security Group or VPC Security Groups membership, or from designated ranges of IP addresses. You can also use VPC subnets and the associated network Access Control Lists (ACLs) if you'd like. You have a lot of control and a lot of flexibility.
The next step is to launch a DB Instance within the VPC while referencing the DB Subnet Group and a DB Security Group. With this release, you are able to use the MySQL DB engine (we plan to additional options over time). The DB Instance will have an Elastic Network Interface using an IP address selected from your DB Subnet Group. You can use the IP address to reach the instance if you'd like, but we recommend that you use the instance's DNS name instead since the IP address can change during failover of a Multi-AZ deployment.
Upgrading to VPC
If you are running an RDB DB Instance outside of a VPC, you can snapshot the DB Instance and then restore the snapshot into the DB Subnet Group of your choice. You cannot, however, access or use snapshots taken from within a VPC outside of the VPC. This is a restriction that we have put in to place for security reasons.
Use Cases and Access Options
You can put this new combination (RDS + VPC) to use in a variety of ways. Here are some suggestions:
Your Turn
You can launch RDS instances in your VPCs today in all of the AWS Regions except AWS GovCloud (US). What are you waiting for?
-- Jeff;
Can your website handle a sudden spike in activity? When LoveBook Online, an online retailer of custom books, made their first national television appearance, their site wasn’t ready to handle the ensuing traffic and crashed. When their second national television appearance rolled around, they had Managed Cloud and Cloud Files on their side and traffic flowed smoothly. (Click here to read the LoveBook Online case study.) Today, the Cloud Avengers help a marketing team discover the cloud to handle activity spikes and enable the team to launch campaigns faster without worrying about potential traffic jams.
Click image below for larger version.
Come back tomorrow to see how the Cloud Avengers bring a small solution to a big data storage problem. Check out more adventures from Cloud Avengers below:
Cloud Avengers Save the Day with Cloud Files
Cloud Avengers Knock Out System Crashes
Cloud Avengers Annihilate Software Bugs
Cloud Avengers Free the Server Room
Cloud Avengers Rescue Broken Website Images
Embed This Graphic – Copy Source Code Below:
<a href="http://www.rackspace.com/cloud/"> Cloud Computing Services <img src="http://c179631.r31.cf0.rackcdn.com/Cloud_Comic_4_updated.jpg" alt="Cloud Computing Services" /> </a>