Adopting the Enterprise Architecture
The enterprise architecture is a holistic architecture for large enterprises to use IBM Cloud® at scale while staying within IBM Cloud limits.
The enterprise architecture follows best practices and principles across networking, resource organization, security, Infrastructure as Code (IaC), and other domains. The impact of this holistic approach is a dramatic reduction in application team overhead, leading to increased productivity, security, and compliance. It is necessary to understand the enterprise architecture before reading this paper.
While each enterprise has a unique starting point, some form of transition is typically required for an enterprise to use these recommendations. By adopting the recommendations, your enterprise gains the benefits of the architecture such as high scale, reduced cost, and improved compliance and security posture.
This paper discusses strategies that can be used to adopt the enterprise architecture from almost any starting point.
Adoption framework
Adoption of the enterprise architecture can be framed as a set of steps and decision points. This framework provides a methodical process to transition workloads to the enterprise architecture by using the following steps:
Understanding benefits and objectives
The enterprise architecture uses best practices to achieve compliance, scale, efficiency, security, and effective FinOps. This architecture is aligned with IBM internal use and benefits from high levels of IBM support. For more details, see the Enterprise architecture white paper.
Depending on your organization's starting point and objectives, it might be desirable to adopt only a subset of the recommendations. For this reason, it is important to define your objectives for transitioning a particular workload.
To plan for your adoption of the enterprise architecture, ask yourself:
- What is the scope of adoption? Should certain workloads be excluded?
- What is the end state that you want for particular workloads?
- What is the timeline?
- Does adoption pose more risk for certain elements?
Justifying a change
Key motivations for adopting the enterprise architecture include:
- Cost savings. Shared infrastructure efficiencies, reduced operations overhead, and better FinOps practices reduce cost.
- Increased compliance and security. Deployable architectures start secure and compliant and can be centrally maintained. Infrastructure audits are handled centrally and with fewer resources.
- Easier operations. A central team manages infrastructure, which requires fewer operations resources and maximizes resources with scarce expertise.
- Reduced need for expensive skill sets that are outside the focus of the business. Networking, security, and compliance expertise can be centralized into one team for greater effect.
- Increased scale. Scaling limitations for applications and organizations are avoided.
- Increased governance. The architecture reduces the opportunity for errors, allows rollbacks, and eliminates the possibility for bad actors to undermine security controls.
- Dramatically improve application developer productivity. Application developers don't need to concern themselves with infrastructure, high availability, BCDR, compliance, or security.
Assessing existing resources and workloads
Before you determine an adoption strategy, build an inventory of existing resources and workloads.
This inventory might include catalog resources like clusters, databases, and other services, but also include things like access policies and account settings. Then, assess the inventory to determine how the existing resources and workloads would fit into the recommended account structure.
- Locate accounts and workloads that need to be moved. Make sure that all resources are allocated, as missed resources cause problems later on. Use billing records to locate all IBM Cloud accounts that potentially contain resources. Use global search to locate resources within accounts and record them.
- Logically partition the existing resources into groups by architecture, sensitivity, automation, and so on. Then, subgroup these resources based on where they fall in the enterprise architecture account structure. Be sure to include groups for network infrastructure, common services, backups, nonproduction and production workloads, and so on, as including these groups eases reasoning about the transition.
- Take note of key factors that influence adoption strategies. Some key factors include types of data storage (and resulting data migration capabilities), availability requirements for applications (and possibility for scheduled downtime), business criticality of applications, scale, and complexity.
- Delete unneeded resources or mark them as unneeded by using tags. Don't waste effort on unneeded resources or workloads. The Resource Explorer can help you find unused catalog resources and Identity and Access Management can help you find unused security resources.
Determining your adoption strategy
Adopting the enterprise architecture might include both organizational and technical transformation to achieve all the benefits.
To determine your adoption strategy:
- Evaluate the delta between the existing architecture and the target architecture for each group of resources. Consider your account, security posture, and level of automation.
- Pick a strategy appropriate for the workload or related group of workloads. Use the decision tree to help with strategy selection.
- Consider how these changes will affect users. Reorganizing operations, security, network, and compliance expertise might be needed. Also, new cloud access procedures and operational processes might be required.
Technical strategies
On the technical side, several strategies can be considered:
- App by App migration. Migrate one workload or a group of workloads at a time into newly created workload accounts. Deploy a new set of accounts that follow the enterprise architecture recommendations. Then dual deploy workloads to both old and new infrastructure until data migration and testing is complete.
- Piecemeal migration. Migrate individual aspects only. For example, move dev to a new structure, or adopt the IAM recommendations, or move to the recommended network architecture only. Details depend on what aspect is being migrated.
- New applications only. Leave existing workloads alone, only new work is deployed into the new structure.
- Transform in place. Implement the architecture by gradually transforming existing deployments rather than migrating to a parallel set of infrastructure.
- Hybrid. For example, transform databases in place and then use an app by app approach to move workloads to a parallel infrastructure.
Each strategy has more details, including pros and cons.
Technical strategy decision tree
To help with selecting technical strategies, the following decision tree can be used as a guide:
Keep in mind that any decision tree incorporates only a few key criteria, so be sure to read up on the details of each strategy before adopting.
Nontechnical aspects of adoption
In addition to the technical aspects of adopting the enterprise architecture recommendations, there might be impacts to individual users, procedures, and organizations that should be considered.
- DevOps users might need training on the use of Infrastructure as Code as they transition from directly manipulating cloud resources to adopting Infrastructure as Code.
- All users might need to login to different cloud accounts and potentially learn how and when to use trusted profiles as the centralized administration model is adopted.
- DevOps procedures and runbooks might need to be updated to align with new network models, centralized administration, IaC, and so on.
- Development and DevOps teams might benefit from reorganization so that experts in core functions such as infrastructure as code, security, networking, and compliance are located in centralized teams that are responsible for developing and maintaining the deployable architectures for shared infrastructure.
- Operations teams might benefit from reorganization so that operations experts are located in centralized teams that operate the shared infrastructure.
Identifying risks and barriers
A solid risk management plan, including identification of potential barriers, is essential to a successful transition.
- Per project, after a strategy is determined, identify workload-specific risks or barriers.
- Technical challenges might include availability concerns, data sync issues, encryption, application technical limitations, and so on.
- Nontechnical challenges might include data protection or compliance concerns (for example, GDPR), cost, timing (avoid peak loads), training, org impacts, resourcing, and so on.
- Develop risk mitigation strategies.
- For example, ensure that migrated data remains encrypted, don’t touch live workloads during sync, plan any reorganizations, providing training, and so on.
- Ensure the plan, including the risk mitigation, is documented for each transition project.
Common challenges
- Live data is not easy to migrate.
- Data in active applications is changing rapidly and thus cannot be easily migrated through backup and restore.
- Applications and infrastructure deployment is not automated.
- Automation to redeploy is not available or not suitable.
- Applications contain hardcoded references.
- Applications refer to specific IP or DNS or other addresses that make setting up a parallel deployment difficult.
- Applications refer to specific service or user identities that are account-specific make moving to a new account difficult.
- Joining multiple VPCs with transit gateways can be complicated by problems with overlapping address spaces and poor network ACLs.
Preparing for adoption
After you select a strategy, check if any of the following prerequisite steps are required and completed.
Relevant capabilities
Before you apply one of the technical strategies, it can be useful to become familiar with some of the relevant tools and capabilities in IBM Cloud.
- Import existing accounts into an enterprise.
- Move accounts into different account groups
- Database back up and restore to a different account to move databases
- Database sync across accounts:
- Configure a redundant Key Protect instance for HA and handling cross-region or cross-account restore for Hyper Protect Crypto Services.
- To follow an Infrastructure as Code (IaC) approach, resources need to be re-created by using automation within the new account structure. You can easily embrace this strategy by using IBM Cloud projects, which rely on automation to create resources.
- Use deployable architectures to create new compliant infrastructure.
- Terraformer can be used to get a start on building a deployable architecture for existing resources.
- Manual resources and bulk resource tagging by using projects.
- Import an existing schematics workspace into a project.
- Onboard a deployable architecture to the catalog for sharing.
Implementing transition strategies
Implement one or more of the technical strategies to adopt the enterprise architecture. The pros and cons of each strategy are included.
App by app migration
With this strategy, a single application or family of related applications is migrated to a set of workload accounts, which exist in parallel with the existing infrastructure for the application. After the migration is complete, unused infrastructure in the original accounts can be decommissioned.
- Select a workload for migration and add related resources to a project in preparation for tracking resources during migration.
- Update the workload as needed to make it configurable and able to run in all locations. These updates might involve code changes to parameterize hostnames, URLs, IP addresses, and ports.
- Deploy infrastructure as needed in the nonproduction and production workload accounts by deploying architectures from projects that are hosted in the business unit hub account. Ensure that appropriate access, networking, and dependencies are in place as part of the infrastructure setup.
- Configure delivery pipelines to deploy the application to both the original and new infrastructure such that application deployment is synchronized in both environments.
- Migrate data by backing up, restoring, and syncing all related data from the original data storage or service. If periodically migrating data, this step might need to be repeated before applications are activated on the new infrastructure.
- Test the new deployment, update the infrastructure and application, and return to step 2 or 3 as needed.
- Activate applications on the new infrastructure. For example, update DNS records and load balancer. Consider routing only a percentage of traffic to start if data can be synced live. Ensure that a failback is available in case issues occur.
- Decommission any unused resources from the original deployment. Use the project configuration from the preparation phase to help locate these resources and complete bulk operations.
This strategy is low risk and gains all of the cost and operation savings that are associated with shared infrastructure and IaC managed workload accounts. Using parallel infrastructure allows for a smooth transition and easy failback should problems occur. However, this approach does temporarily double infrastructure costs and can be slow to run. Also, data sync and infrastructure migration can be difficult. In addition, data services encrypted with BYOK might have extra concerns with migration. For more information about data migration, see relevant capabilities. Despite these drawbacks, this workload migration strategy is likely the best for most organizations.
Piecemeal migration (nonproduction)
With this strategy, nonproduction workloads are moved into separate workload accounts.
Options:
- Use the same process as app by app migration, but migrate only nonproduction workloads. Because nonproduction workloads don't typically have critical data, it might not be necessary to migrate data and even if it is, a period of downtime during the migration can often be much easier to manage.
- Bulk migrate your nonproduction workloads to new infrastructure. This is a similar process to app by app migration, but the infrastructure for a group of nonproduction workloads is deployed and those workloads are switched to deploy to that infrastructure all together. Bulk migration is most appealing if data migration is not required.
Migrating nonproduction workloads into a separate account from production workloads provides an important separation of concerns, making it easier to ensure that users and processes don't accidentally operate against the wrong data or service. Moving only nonproduction workloads eases data migration concerns and further reduces risk as production isn't touched. This strategy works well combined with Transform in place for the production workloads.
Piecemeal migration (networking and shared services)
With this strategy, networking and shared services are set up in a new account and then linked to existing workload accounts, which creates a hybrid architecture. This strategy works well with piecemeal migration of nonproduction and isn't required when using app-by-app migration as a duplicate set of these services can be used instead.
- Add any existing networking and shared services to appropriate projects in the central administrative account in preparation for tracking resources during the migration.
- Deploy new networking and shared services in the new network and services hub account according to the enterprise architecture. You must also assign appropriate access.
- Export transit gateway information and direct link information from any existing deployment of those services.
- Configure the new transit gateway to use a direct link connection.
- Configure the direct link, by using knowledge from previous direct link
- Use exported transit gateway information to connect existing VPCs in the original accounts to the new transit gateways.
- Setup required keys in HPCS or KeyProtect for any BYOK protected services that you require. If existing BYOK-protected services are being retained, this involves reimporting key material and updating the configuration of those BYOK-protected applications as described here.
- Optionally, migrate any shared applications following an application migration pattern.
- Decommission original networking and shared services by using projects to locate redundant resources and support bulk operations.
Existing VPCs might have overlapping addresses that conflict with a flat network design. You must resolve overlapping addresses before implementing the flat network described in the enterprise architecture. This strategy is often best used after separating nonprod from production workloads so that these network changes can be tested with nonproduction workloads. Migrating Key Protect and Hyper Protect Crypto Services to a new account to be used by existing services with KYOK is challenging and might not be possible for all services. Consider leaving existing BYOK protected data services unchanged and using only the new instances of Key Protect/HPCS to protect new data services.
Piecemeal migration (identity and access management)
With this strategy, identity and access management (IAM) is configured in new accounts to support existing user's job functions. This strategy works well with app by app migration.
-
Analyze existing access groups, access policies, and trusted profiles to determine which groups of users are permitted which general access. Use IAM audit reports as needed.
-
Analyze these groups of users to determine their job functions, for example, developer, operations, or finance.
-
Map those job functions into access policies tied to access groups within the enterprise architecture. The enterprise architecture uses an Infrastructure as Code (IaC) approach, so users don't have direct write access to resources. Instead, users have write access to projects, which are used to deploy and update resources.
-
Update deployable architectures so that the correct access groups are provisioned, trusted profiles are created, and so on. This should include all workload accounts and administration accounts to make sure that users have the correct access for their job functions.
Do not migrate existing access directly into the new architecture, as best practices for access needs to be adopted as a part of this transition. Use trusted profiles, access groups, and projects as a means to create and update resources. The results of adoption are better governance, security, and ease in understanding user access.
New applications only
This strategy doesn't attempt to transition existing applications. Rather, it builds out parallel infrastructure for the workload accounts and deploys new applications into that environment. This strategy can be combined with a transform in place, and potentially some piecemeal migration of major common functions like networking and access management.
- Existing applications remain in place. (option) Consider transform in place for these.
- Newly developed or newly migrated to cloud applications are deployed to new workload accounts in alignment with the Enterprise Architecture recommendations.
This strategy is safe and easy, but does not attempt to address existing applications and workloads. The Enterprise Architecture benefits will only apply to new applications.
Transform in place
With this strategy, data migrations are avoided and existing accounts and resources are refactored to better align with enterprise architecture recommendations. Certain common operations need to be considered, but the exact refactoring operations depend on your enterprise's starting point, resulting in various substrategies.
Move from manual resource management to using infrastructure as code, deployable architectures, projects, and schematics to deploy and update resources.
- Create deployable architectures to manage your existing resources. Use Terraformer to reverse-engineer existing resource deployments into terraform automation and terraform state. Generated terraform and any previously existing terraform can be used to help create deployable architectures. See relevant capabilities for more information.
- Create a series of projects in an administrative account that's separate from your workload accounts. These projects are used to maintain the existing infrastructure by using IBM Cloud project governance.
- Restrict access to existing resources so that changes can be made only by using projects.
This strategy allows existing workloads to continue to run unchanged, but shifts into an Infrastructure as Code mode of operation that is better governed and more repeatable. To reduce risk, new IaC should be used to update nonproduction workloads before production.
Separate production and nonproduction
An alternative to migrating nonproduction to new accounts is to use access groups, resource groups, tags, and naming conventions to separate nonproduction workloads from production workloads.
- Identify all nonproduction and production resources and apply a "nonproduction" tag and a "production" tag to ensure visibility. Also consider adding a prod or nonprod prefix or suffix to the name of all resources.
- Apply access tags to nonproduction and production resources or use existing resource groups if they happen to properly separate nonprod and prod. Resource groups can also be renamed with and prefix or suffix to make their role clear.
- Adjust the access policies so that users have access to nonproduction, but have limited access to production.
This strategy provides some of the benefits of the recommended nonprod prod separation, but is not as safe over the long run that is compared with separated accounts. Consider migrating to nonproduction for increased safety.
Designate shared compute infrastructure
Rather than hosting every application on dedicated compute (and potentially DB) resources, designate selected existing compute infrastructure for shared hosting. Shared infrastructure saves in hosting and operational costs.
- Designate selected existing compute infrastructure (for example Red Hat OpenShift Clusters) as shared resources
- Reorganize so that a single team can manages the shared infrastructure.
- Make any adjustments required to allow the compute infrastructure to be suitable for hosting multiple applications. This might require introducing namespaces in Kubernetes clusters or load balancer pools for VSI clusters.
- Deploy new workloads onto the existing clusters. (optional) Consider consolidating some existing workloads onto the shared infrastructure.
This strategy provides many of the shared infrastructure benefits, although it may not be quite an elegant and easy to manage as new compute infrastructure designed specifically for shared use.
Separate backup infrastructure
Adjust existing data services and applications to send their backups to a separate account for maximum isolation.
- Create a separate account to store backups with limited user access.
- Adjust backup automation to ensure that backups are stored in a separate region and in the backup account, where possible.
- Decommission older backups after they are no longer required.
Refactor the networking to align with the hub and spoke network strategy.
- Designate the account that currently contains the direct link connection as the network hub account.
- Connect VPCs to the direct link by using transit gateway in the hub account. This might involve removing transit gateways that are located elsewhere.
This strategy provides most of the network simplification benefits that are described in the Enterprise Architecture, although it might not be quite an elegant and easy to manage as when core network services are provisioned in a central account.
Hybrid (database in place)
The hybrid strategy leaves your existing databases and data services in place, while you migrate applications and nondata services into the new architecture:
- Leave databases and other data services in place, adjusting only access permissions to align with enterprise architecture recommendations.
- Migrate the applications and nondata services into the enterprise architecture structure by following any of the strategies that are outlined in this white paper. The App by App strategy is recommended.
- Configure/update any access policies and context-based restrictions necessary to allow your application to reach the data services across accounts.
This strategy avoids potentially risky data migrations while realizing the benefits of Application workload consolidation, but results in a slightly more complex account architecture as data services are not collocated with their applications. As a result extra care must be taken with access policy and context-based restrictions to ensure that only data services can be accessed appropriately.
Evaluating the transition
Before you decommission any redundant infrastructure post transition, a rigorous evaluation should be performed to ensure that everything is operating as expected.
- Test workloads in target deployment.
- Test user access and processes, ensure run-books are updated.
- Ensure that data is synchronized.
- When parallel infrastructure is used, begin a gradual cut-over by allocating a percentage of traffic to new deployment if possible.
- Scale up to support full workload.
- Complete cut-over and run for burn in period.
- Decommission any redundant infrastructure.