Continuous Improvement

身處 Amazon 工作 5 年學到的事：在 AWS 擔任 Cloud Support Engineer 是什麼樣的體驗

2023-04-01T00:00:00-05:00

2023 是我在 AWS 擔任 Cloud Support Engineer 的第六年，這份工作對我來說某種程度上是又愛又恨。作為見證 AWS 中文技術支援團隊的一員，五年時間說長不長，說短也不短，卻有非常多學習和體悟。

身處 Amazon 工作 5 年學到的事：在 AWS 擔任 Cloud Support Engineer 是什麼樣的體驗 (source: Unsplash)

一轉眼五年過去，除了見證團隊的成長，也看過各種人來來去去。在 Amazon 待超過五年的員工不正確估算在全球大概佔 PR 90，在同一個團隊待超過五年的人更是稀缺。一滿五年，你就能夠晉身橘色識別證的一員¹。作為團隊少數幾個拿橘牌的員工，不免有一些勵志或是辛苦的工作歷程可以分享，可惜忙碌跟快速的工作步調讓很多想法稍縱即逝。尤其是回顧自己過去寫下的內容 (我在 Amazon 學到的 10 件事)，驚覺這些改變其實無時無刻都在發生，我確實明白自己有些模樣在這五年前已經有所改變，慶幸的是我沒有因為環境而變得事故。

我想或多或少是因為能夠透過回顧這些內容，甚至是因為協助面試的緣故，很多來面試的候選人甚至主動提及曾經讀過我的內容並且因此受到一些啟發 (不論是否最終有進入我們團隊工作)。這些文字跟反饋都讓我時刻刻提醒自己要永保初心，也要求自己用更成熟的方法處理事情。有些文章甚至獲得很多各方高手的迴響，但一直遲遲沒有安排時間再繼續寫更多。因此寫下這篇內容，也算是我個人對自己在用五年後的視角，來總結我對這個團隊喜歡和認為有待加強的地方。

在這裡先提個免責申明，AWS 雖然作為 Amazon 整個集團重要的一個業務，但有許多管理的方式可能也因不同組織和角色而有所差異。畢竟 Amazon 在全球有數十萬名員工分佈在世界各地，工作時遇到一堆奇奇怪怪的部門早已屢見不鮮，用 AWS Support 不能以偏概全所有團隊的樣貌。

此外，這篇內容僅為個人的觀察，不代表任何官方立場，本篇內容更不期待帶有任何批判色彩。在團隊裡身為一個獨立貢獻者 (Individual Contributor)，大部分時間我也是站在第一線解決各式各樣的工程問題，而非討論管理維運議題。因為不是站在管理層的角度用宏觀的面向關注團隊維運，工程師關注的議題跟管理者的議題多少都有些不同，我僅分享我對於身為工程師及親身經歷的相關感受，不意味著任何一方不好。

畢竟我的角度為「工程師」而非「管理者」，因此可能部分觀察的角度也會有所侷限，但會盡可能用客觀的角度寫下我個人的觀察，還請閱讀者自行評判。

認識不同年齡層工作者的機會間接拓展更廣的人際網路

我在中文技術團隊觀察一直沒變的事情是，團隊中充滿了各式各樣優秀的同事，許多不乏都是在業界工作數年的工程師並且擁有豐富的經歷。有趣的是，時下的招聘政策也鼓勵許多剛畢業的學生加入 AWS Support 技術團隊，不時為團隊注入心血，使得整個團隊的組成十分多元。常常也有許多不同的新進同事從其他公司帶來不一樣的經驗和想法，甚至從原本使用 AWS 產品的客戶端變成解決問題的角色，都促使團隊中因為不同背景的工作者加入彼此交流而建立強大的人脈網路。

與全世界交流

Cloud Support Engineer 是一個全球化的團隊，工程師遍佈世界各地。當你身肩的責任和任務越多，自然越有更多機會與全世界的同事交流和執行專案。我個人曾經主持過跨三個時區的技術講座和訓練，不僅讓我實際應用專案管理的思維、訓練教學技能，更讓我進一步增加自己的能見度，並且實質提升團隊的技術職能，為團隊作出貢獻。

能進一步打開自己的視野並與全世界交流，是我個人最喜歡的部分。尤其身處 Dublin，多得是不同語言和背景的工程師，充滿了不同國家、語言、背景交流的機會，除了專注在技術上的交流，工作之餘也可以聊聊不同背景之間的特色和學習彼此的文化。

但如果人為在台灣的技術團隊，實際上可能因為時區的緣故，部分交流多少會有一些受限。但與世界各地的同事交流仍只是使用通訊軟體這種觸手可及的事情，壞處就是你可能要額外安排其他時間 (比如你必須要早起、晚上回訊息或參加會議)。

增強抗壓能力

我認為這份工作與一般軟體開發最大不同的是，由於面對的是客戶的生產環境，Cloud Support Engineer 不時會有面對高壓的情境，例如功能故障、應用程式不工作的事件、系統崩潰等等。有時候客戶甚至會不斷的催促你盡快給予資訊，在這種情況下，你需要學會保持冷靜和集中注意力，同時，不能因客戶的情緒引領你往不正確的調查方向。

我還記得第一次面對客戶環境故障時，我還得請求資深工程師一同指導解決客戶所遭遇的技術問題。但在從中觀摩學習後，並且檢討學習排查問題的方法不斷練習，慢慢將這種壓力視為對客戶的理解，學會用冷靜的態度處理棘手的狀況，正確的釐清和排除。可能有些人認為是缺點 (例如：客戶很難搞)，但實際在走過一遭後，一旦你跨過那個不舒服的過程，將成為無懈可擊且帶得走的軟實力，就看你是否願意視這些挑戰為成長的機會。

這樣的經驗和訓練不自覺在生活中產生一些幫助，尤其是面對系統故障或是一些非預期的狀況發生時，第一直覺反應是釐清問題並且如何找出解決問題的方式，而不是受情緒或環境影響不斷焦慮。

磨練跨團隊的溝通技巧

Cloud Support Engineer 與軟體開發不同的是，這是一個大量需要溝通的工作角色，並且將技術問題用淺顯易懂的方式與客戶分享相關的調查結果、提及可執行且理解的步驟供客戶採納 (不論是書信、Chat 或是電話形式)。由於面向的客戶端存在各式各樣的角色，你除了要學會用開發者能理解的方式解釋問題和解決方案，不免遇到客戶端主管級別的角色想釐清問題的相應狀況；同時，AWS 也會有不同面向的客戶端角色一同協助客戶的問題，身為技術工程師也讓我多了很多必須要學會與這些不同角色溝通的技能。

某種程度上學會站在他人角度、同理他人。在產品開發團隊前面，我會需要了解客戶遭遇的問題是什麼、如何複製、點出產品當前的問題、建議如何修正。除了要讓產品團隊理解當前問題要排查的方向，更需要對產品整體的核心運作有一定的認識，才能使用開發團隊所能理解的語言有效的將問題修正；在客戶前面，我需要了解客戶所遭遇的問題、客戶的痛點，並且提出可參考的實務建議，以幫助他們在業務上透過產品的功能和方法，滿足他們業務上所期待的目標，甚至有時候需要引導客戶改正問題以爲他們帶來長遠的效益 (因為有時候客戶有自己的想法並且多急於解決當下的短期問題)。

提升書信寫作能力

除了 Amazon 組織文化本身就具備寫文件的精神外，技術支持工程師會有大量的時間會投入在將複雜的問題調查報告轉譯成客戶或是產品團隊能理解的語言，並且寫出能夠讓客戶理解的書信內容。

一個 AWS Support Center 的通訊範例 (非技術相關)

各種曝光和成長的機會

AWS Support 除了是一個跨國組成的團隊外，AWS 本身就提供了一個能夠讓你成長和曝光的平台。

身處中文 DevOps/Container 領域的技術團隊，我特別喜歡的一點是週遭的同事都非常支持且互相幫忙，並且在自己的職涯規劃上都很積極，不會只侷限在日常協助客戶解決單一 Support Case 的問題上。

即使每天日常解決多少個 Support Case (Ticket) 很重要，但更多得是其他面向的工作幫助你成長不同面向的技能。由於 AWS Support 密切的與不同產業的客戶合作，一個顯著的例子是透過客戶端面向的教育訓練機會幫助你成長，為不同規模的企業客戶分享有關 AWS 產品的使用建議和最佳實踐。

此外，為了協助更多客戶解決技術問題，內部不時充斥各種專案和計畫。不論是藉由影片、技術文章或是教育訓練，團隊成員們都會透過不同的方式提升自己的技能。除了使中文的客戶受益，更多時候貢獻己力讓全球的客戶受益，並且在全球打開能見度。例如，以下都是我或是同事們貢獻的各種內容：

AWS Knowledge Center

更甚者，我厲害的同事們多是路見不平拿 Pull Request 來填，甚至會提交對應的修補程式，或開發對應的工具讓更多客戶從中受益，例如：

(aws-cdk) fix(eks): cannot disable cluster logging once it has been enabled #21185 - Kuo-Le Mei
AWS::RDS::DBInstance replacement update notice #1225 - Yu-Chi Chen
cloudtrail-cli - Kuo-Le Mei

彈性的職涯規劃路徑

作為開啟 AWS 職涯的敲門磚，Cloud Support Engineer 著實是一個充滿學習廣度機會的工作，這也是我個人覺得這份工作十分有趣的地方，例如：

處理客戶案例所累積對於 AWS 服務的技術知識和高可用架構設計思維、在客戶端大量的提供教育訓練或最佳實踐建議，轉職成為解決方案架構師 (Solution Architect)
針對特定客戶所遭遇的問題提供協助而逐步轉職成為技術經理 (Technical Account Manager)
協助客戶執行教育訓練或是訓練內部工程師，培養成為講者的能力 (Trainer)
設計內部工具和相關專案計畫累積系統設計相關的開發技能 (Software Development Engineer)
大量的書信寫作訓練和開源文件貢獻機會，轉職成為技術寫手 (Technical Writer)
整合內部資源執行跨區域或是全球的計畫，逐步成為主管 (Operation Manager) 或是資深工程師

還算可接受的 Work-life balance

雖然每個人對於 Work-life balance 的定義不同 (就我的觀點，這是一種相對感受)，但比起很多公司的 IT Support 或是工程師職位，「相對來說」，AWS Cloud Support Engineer 可能不會是一個非常輕鬆的工作。

以目前團隊的工作型態，為了提供客戶 24 x 7 x 365 天不間斷的支持，除了意味著國定假日或是週末會是你的工作時間外，團隊成員彼此之間通常工作時間會有部分不重疊的輪班制度。

但之所以我認為這還算可接受，主要有以下幾點觀察：

(1) 在過去，在台灣的中文團隊值班時間需覆蓋整整 16 小時，直到晚上 11 點左右才轉移至北美時區 (這意味著當時有部分同事需要工作到晚上 11 點)。由於值晚班這件事情對很多人來說並不是一個很健康的工作型態，管理層也在台灣團隊成立不久後，不斷地尋求可能的解決方案。隨著歐洲團隊的建立，這樣的現象也趨於改善，使得台灣能從值晚班的噩夢中解放，將工作時間往前推移 (能下班的時間越來越早)。
(2) 即使 Cloud Support Engineer 同樣會有 Oncall 的機制，團隊的 Oncall 會以工作時間為主。由於是全球化的團隊，工作時間結束後的 Oncall 班次將會由其他時區輪值。

大部分情況下，新進人員在 Work-life balance 這件事情上面通常能有很好的控制。但隨著想做的事情越多，可能在你身上肩負的責任也越多，使得工作與生活上不見得能夠充分平衡。例如：我觀察到資深的工程師，有時候也得必須配合美洲時區在晚上時間 (21:00) 之後開會。但「相對來說」，比起傳聞有些 Amazon 的開發團隊需要半夜 Oncall 起床處理問題，確實種程度上是還可接受的 Work-life balance。

雖然難以置信，但在提供客戶 24 小時不間斷服務的背後，仍有 AWS 開發團隊需要負擔全天候的 Oncall 工作，不得不得在半夜時間起床。尤其你身為 AWS 技術支持工程師並具備一定資歷後，多少會接觸到客戶生產環境故障的問題，多的是把北美時區的開發者叫起床的機會。

作為曾經在半夜被叫醒的工作者，我個人十分認同半夜起床值 Oncall 是一個很不健康的工作型態。雖然團隊對於工作型態的設計不是我很喜歡的一點，但在方面團隊整體確實有在以緩慢的節奏進步。隨著加入的人才越多，整個團隊的工作安排會趨於理想值。

客戶無法區分清楚真正的問題緊急程度：客戶預期和實際環境的衝突導致工作時間碎片化

日常工作時間的碎片化是我個人很不喜歡的一點。要討論這點之前，需要先理解整體環境的影響佔這個問題的關鍵性因素。取決於市場型態不同，客戶的行為也會有所不同；對於中文的客戶，由於客戶習慣即時通訊方法和立刻響應速度，這往往造就了客戶使用技術支持服務時，產生很多非預期的行為。

即使在文件和產品頁面中明確定義了嚴重性的不同 ² ³，客戶習慣仍傾向選擇最少的時間來開啟案例 (能選多短就選多短)，而非真正的問題嚴重性影響 (例如：客戶有一個專案趕著下週上線所以選擇最短的響應時間 15 分鐘，而非系統正在當機)：

在產品頁面明確定義的案例嚴重性

在文件頁面明確定義的案例嚴重性

即使明白客戶常常不正確的選擇案例嚴重性，AWS Support 仍提供客戶最大的決定權。然而，這樣的現象某種程度上確實也導致工程資源被濫用。這就好比家喻戶曉的伊索寓言「狼來了」中描述的故事，當假警報一多，除了使得團隊無法正確區分真正受到生產環境影響的故障，更嚴重的是由於工程師都被一堆非故障影響的問題佔用，使得團隊工程師無法很好平衡不同問題之間的嚴重性，即時協助真正有環境受損影響的故障。

這個現象所帶來的影響更使得中文技術團隊必須大量的應付客戶這種短而快的回覆，而往往喪失能夠專注在技術問題上的時間。我看過許多新進人員因此無所適從，迫使被拉去處理大量需要短時間回應的案例，而無法真正的在單一案例中投入太多充分的時間進行調查：一下忙 A 案例，一下被抽去做 B 案例，或是手邊正在忙碌的事情、正在開的會不得不中斷去協助客戶，導致工作時間的碎片化。如果間接犧牲的是客戶長遠的服務品質，我相信整個市場型態有很多有待改進的空間。

相對來說，日文客戶就慣用「一般指導」、「系統效能不佳/系統受損」等這類真正反應其事實狀況的案例嚴重性，進行問題指導和事件後的相關問題調查。這樣的市場型態讓我從日文團隊同事中間接受益，從他們每個人身上提供給客戶的回覆我學習到非常多。我常常可以觀察到他們提供給客戶的訊息都十分詳細且完整，在回覆前不僅做了非常多詳細的測試，更提出各種可參考的方案或是 PoC，甚至在內部已經討論過一圈 (但缺點是客戶需要有足夠的耐心)。

當然這並不是要比較哪種客戶比較好，畢竟，從親身體驗過印度客戶的型態，也覺得支持上充滿挑戰，就明白這並不是單一語系客戶的問題，每種客戶都有各自的特性。而是從實際經驗中，讓我對文化和市場差異有更深的體悟。

同時，團隊也在針對這項問題做了很多不同面向的嘗試，以試圖優化工程師在工作上的痛點，讓客戶學習如何更好且正確的使用 AWS 技術資源，以幫助他們解決真正重要的問題。

快速迭代的流程有時讓人無所適從

由於團隊快速成長且存在多時區的問題，為了適應不同問題的情境，流程的修訂和快速迭代有時往往讓人跟不上。例如：有時候今天建議的工作流程 A，可能明天會變成 A-1 版本，再過幾週可能變成 A-10。由於流程常因客戶需求和問題不斷迭代，有時候不見得能夠即時的在各區域中套用，或是特定區域根本是獨立的系統，有自己的一套系統運作。

除了「朝令夕改」大概是最適合用來形容團隊一些流程上的現況，很多時候團隊會不斷嘗試導入一些新的方法或是流程。我自己有時候都會覺得無所適從。必須要不時回去翻翻內部的文件，或是回到一些原則性的討論，以了解是否有任何理解錯誤。

雖然有時候感到混亂，但這種迭代的過程在團隊中會一直不斷的進行，你得學會適應快節奏和不斷變化的文化。

學會如何成為一名更好的客戶

最後，「學會如何成為一名更好的客戶」絕對是我任職客戶面向技術支持工程師角色這幾年收穫最大的體悟，絕對可以列為最重要的一點。由於日常工作中實際就是在做類似客戶服務性質的工作，工作中不免看盡客戶百態，了解不同客戶的型態，並且了解到什麼是好的、什麼是不好的。

身為一名技術支持工程師，隨著協助客戶經驗的累積，漸漸地學會如何站在客戶服務人員的角度思考，並且同理身為客戶服務角色的辛苦。

當你了解到客戶並未尊重專業時，那份挫折感絕對是深深擊潰你對於技術的自信。甚至很多時候即使你的建議正確，並且成功解決客戶問題後，客戶可能就只把你當做一名 AWS 的後台人員，也不見得會獲得客戶的肯定。

在開始工作的頭幾年我很不能理解，往往覺得在工作上付出了努力但仍得不到任何客戶的反饋；但轉頭一看，其實還是有非常多的客戶展現充分的專業並且在工作上有很好的合作體驗，更加學會心平氣和的理解不同性質的客戶。也因此學習到如何成為一名「好客戶」，更懂得如何在日常的生活中，向不同產業、工作的客戶服務人員展示尊重，可以是餐廳服務生、是銀行電話客戶服務人員，或是任何一種客戶面向的工作者。在我所處的團隊中，團隊成員也都不吝分享自己的經驗，並且分享如何針對不同客戶提供適當的處理方式，藉由這些經驗成長。

自從成為客戶服務角色後，能更同理不同行業類別的客服工作，更加重視第一線的服務人員背後所付出的辛勞。並且在未獲得預期的服務水平時，能提出實質的建議、想法而不是純抱怨。有趣的是，這樣做往往讓我獲得更為滿意的結果 (不是更快的退費效率或是獲得更多的補償)，一同促使產品和服務進步。

總結

在這篇內容中，深入探討了在 Amazon Web Services (AWS) 擔任 Cloud Support Engineer 的工作體驗，並分享了個人在過去五年中學到的寶貴技能和體驗。以上幾點完全屬於個人的觀察，不代表任何官方立場。

如果你正對 AWS Cloud Support Engineer 職位感興趣，希望以上的內容對您有所幫助，也可以透過參考其他系列文章以幫助你了解更多資訊。

看更多系列文章

References

在 AWS Cloud Support Engineer 面試中脫穎而出：必備的技術能力和重要特質

2023-03-26T00:00:00-05:00

如果你正準備面試 AWS Cloud Support Engineer 的職位，在這篇文章中，我將與你分享我對於 AWS Cloud Support Engineer 所需的必備技能和特質，幫助你更好地了解這份職位且思考自己目前所具備的能力是與該團隊所交集，讓你在面試中脫穎而出。

在 AWS Cloud Support Engineer 面試中脫穎而出：必備的技術能力和重要特質 (source: Unsplash)

詳細內容可以參考以下我的團隊針對 AWS Cloud Support Engineer (雲端工程師) 職涯分享會中所提到的具體細節，裡面也包含了部分面試流程和一些小秘訣幫助你可以更好地掌握我們團隊所看重的能力：

請注意本篇內容純屬個人觀點，會分享這些內容的目的是希望有興趣投遞 AWS Cloud Support Engineer 能夠了解具體應該提升哪些必備技能，也同時分享一些日常為客戶解決技術問題時非常重要的能力。寫這篇內容的目的不是要幫助大家 Crash 考題，也不代表任何官方的參考指南。

即使你面試時展現把考題背熟的能力，進來團隊終究會在實際面對客戶問題時怕得要死，因為客戶問題往往都是沒有被定義清楚但又希望你能給予解答。如果只會背誦這些內容，仍然無法實際為客戶解決任何問題。

另外團隊召聘所看重的核心技能也可能隨時間變化，面試也不是一般考試，重要的是了解你能為團隊貢獻什麼樣的技能，看的是不同維度的全面評估。因此以下內容屬於我個人在團隊的經驗和處理客戶案例認為非常重要的必備技能，僅供參考。

如果你還不是很清楚 AWS Cloud Support Engineer 在做什麼，我十分推薦你可以參考我在 AWS 職涯系列的相關文章，以幫助你逐步建立對於這個職位的認識：

基本技術能力 (Fundamental Technical Skill)

很多應徵者把這份工作當成一般設定環境的 IT Helpdesk 或是只是單純的客戶服務職位，以為遵循 Runbook 就能解決大部分工作上的問題。但實際 AWS Support 做的工作與一般公司的 IT Support 會與想像中有蠻大的差異，即使工作上以 Ticket 形式與客戶互動，但角色仍偏向顧問服務性質，直接被拉進客戶會議直接一個人打十個討論問題更是你都可能會遇到的情境，我會建議在應徵這份工作前可以有個心理準備。

我常觀察到很多候選人即使在 IT 界從業多年，對於很多基本的知識都有很大的落差 (例如：我聽過有人說用 ping 可以測網站的 Port 80 看網站是不是掛了)。這種現象在只專注做開發相關工作的工程師身上尤為明顯 (嚴格上來說很多軟體工程師職位都是在「實作」，面對的很多產品規格都已經在現有的封裝函式庫或是公開解決方案的 API 上定義能夠直接套用，所以可能也沒太多機會思考這種這麼核心底層的問題)。

不論是開發人員或是系統維運人員，可能在高階應用和實作上能夠滿足「使用者」身份角度的需求，所以也不需要對於基礎知識有太深入的了解，使得當東西壞掉或問題情境複雜時便束手無策 (也是感謝這些人才讓我有飯吃)。

但 AWS Cloud Support Engineer 就像是醫生這項職業，醫生必須根據病人描述的徵狀跟現有資訊提出正確的診斷步驟，用正確的診斷工具 (例如：聽診器、X 光)，最後開出正確的藥來緩解病人的症狀；工程師在協助 Troubleshooting 問題時也需要依照自己對於客戶提出問題的背景，知道要收集什麼資訊進行分析、用什麼樣正確的工具。

甚至有時候客戶給的資訊還是錯的，這就十分仰賴對於基礎知識至上層的全面了解，提供正確的排查方向，否則就只是在亂查一通。

網路概論 (Networking)

網路概論基本包含：

基本的 TCP/IP 協議運作
OSI 網路七層的基本架構 (由於 AWS Cloud Support Engineer 主要負責的產品就是 “Web Services”，所以 L4-L7 的討論不少，延伸知識例如常見的 Route、SSL、HTTP 等基本認識)
IPv4 的組成 (學習 AWS VPC 時對於 Subnet 的基本認識尤為重要)
DNS 的運作及除錯
HTTP/SSL/TLS
VPN (如果招聘團隊不著重處理網路相關的產品，則不會是討論的重心)
熟悉不同協議中對應的網路除錯工具和實際排查案例

DNS 算是基本中再基本到不行的必備知識，我自己個人倒是遇過很多連基本 DNS 協議都不太熟的候選人 (當然也有些從事 IT 工作的客戶也不是那麼熟)，最常見的問題就是 DNS 查詢的具體流程、DNS 協議的組成和問題除錯。

遇到問題除錯的場景或是系統故障就將矛頭直接指向應用程式或是服務端，壓根沒有想到實際造成問題的其實是 DNS 不正確設定或是一些非預期行為導致。

關於網路概論，有太多的免費資源可以參考，甚至有一些很不錯的參考資源可以具體幫助你了解。可能光以下的連結要全盤了解就讀不完了，我這裡就不一一列舉：

請注意問題的深度仍取決招聘團隊所看重的技術能力，如果是專注網路相關 AWS 服務的團隊，則可能在網路的部分就會更加深入；但對於其他專業團隊來說，由於更多的重心在於協助特定領域的 AWS 產品，則具備基本網路問題排查能力即可滿足協助客戶的情境。

作業系統 (Operating System)

由於我個人不太熟 Windows，為避免誤人子弟，這邊就僅列舉我認為非常實用的 Linux 資源，以及基礎到不能再基礎的檔案系統章節 (其實把鳥哥所有章節認真讀完並且實作，可能就足以面對 60-80% 有關 Linux 維運的情境)：

Linux (file system/operation/administration knowledges)

如果你完全並未具備這方面的經驗，搭建一個 Web Service (HTTP) 涵蓋網路概論至作業系統基本維運操作過程中所必備的知識都是必須的。

重要特質

以下列舉幾個我認為應徵 AWS Cloud Support Engineer 所需具備的幾項重要特質：

客戶服務技能和溝通能力

AWS Cloud Support Engineer 的主要工作是為客戶提供協助，並且常常會需要將複雜的技術問題拆解成能理解的步驟。讓客戶甚至是其他團隊 (例如開發團隊) 能夠清楚地知道如何排查問題、修正哪些錯誤。需要能夠清晰、明確地傳達信息，解釋問題和解決方案。

與一般軟體開發工作不同的是，AWS Cloud Support Engineer 由於需要經手系統故障排查的情境，不免會接受客戶環境上的壓力，例如有時候客戶在系統故障影響到營收的同時，充滿緊張和壓力的情境下只希望能盡快把問題解決 (~~急急急~~)，原本這些專業的 IT 人員也會瞬間變得很不理智。

試想下你剛進問題現場才短短的 5 分鐘，進行 Live troubleshooting 的同時試著釐清問題檢查每一個項目，並且引導客戶執行正確的步驟確認 (因為有時候客戶給的資訊是錯的)。但有時候客戶就是會覺得你在浪費他的時間，具備保持冷靜和耐心特質的重要性在這種情境下就特別顯著。我個人自己聽過的就有：

你到底會不會
這沒必要看，我現在只想要他恢復
你看這沒用啊，你行不行啊，我檢查過了這沒有問題
跟你溝通沒有意義，請找 ECS 專家來跟我溝通

客戶有情緒但你不能帶著情緒協助，因為這樣就是大家都一起 Panic (~~大家一起急急急~~)。我在這份工作確實也學習到很多溝通軟技巧，如果再拉一堆不相干的人進來，我的經驗是這通常只會把問題攪得更亂，對問題調查沒有太大的幫助。

你可以想想自己過去是否有類似的經驗，談論如何與其他人有效地溝通，如何解決複雜的技術問題以及如何處理緊急情況。

解決問題的能力

AWS Cloud Support Engineer 需要有效、精確地識別和解決問題。基本的邏輯和良好的分析能力是必須的，並能夠迅速掌握問題的本質。同時，你需要能夠綜合多個方面的信息，從而定位出問題的癥結點。

簡單來說就是排查問題的過程邏輯要對、能夠正確分析問題、使用正確的方法和工具，知道當問題發生時要如何排查、為什麼用這些工具、為何查 A 不是查 B。比如網站連不上為什麼是用 ping 而不是其他工具、用 ping 返回的結果代表什麼、正確獲取結果後調查的方向是什麼？。

而不是收集到一堆無用的資訊胡亂瞎猜，將問題弄得更發散 (反面案例即是前面提到使用 ping 檢查 Port 80 能不能正常連通)。

主動學習、問題研究能力和自我提升

AWS 產品不斷推陳出新，基本上已經學不完，這項工作不得不跟隨客戶的快速腳步不斷地持續的學習和自我提升。

由於這份工作的角色也從原本用戶端 (使用者) 變成解決問題的角色，在你的專業領域中也必須擁有深入研究問題的能力，當客戶拋出未知且未定義清楚的問題，你通常才能具體的給予明確的排查方向。

專業技術能力 (Professional Technical Competency)

由於各個專業領域都有各自側重的項目，例如，專注 Database 專業的工程師跟專注 Linux 領域專業的工程師對於 Linux 知識的要求定義可能有所不同。可能 Database 專業的工程師具體了解 Linux 的基本原理、知道一些基本的指令和明白檔案系統、檔案權限管理、基本問題排查即可；但 Linux 專業的工程師可能就要非常了解 Linux process 運作、知道如何使用 Linux 的工具更加了解系統效能、知道 kernel dump 怎麼解讀、troubleshooting 等等知識。¹

每個專業領域會有基礎需要知道的基本知識，但團隊的技能樹也都是隨著客戶需求在變化，解決的問題也是日益更新。以下分享一些我認為可能對於所有專業團隊來說都十分有幫助的學習資源：

AWS 相關的知識

AWS Cloud Support Engineer 的工作是支援 AWS 的客戶，你需要熟悉 AWS 的服務和產品，並能夠協助客戶解決他們遇到的問題。同時，你需要知道如何設定和管理 AWS 環境，以及如何進行故障排除。因此如果你對 AWS 技術有著深刻的了解，這會對於你在入職之後非常有幫助。

AWS Training: 包含了各式免費 AWS 訓練的資源
AWS Educate: AWS Educate 是給予大專院校學生或是任何有意展開雲端旅程的任何人，免費學習和進行 AWS 服務使用試驗的計畫，裡面有一些 Training 通常也直接包含了 Cloud Support Associate 角色的相關訓練材料，能幫助你更加了解這份職位的工作內容和必備技能

其他專業技術領域則參考招聘簡介中所提到的對應專業技能，不同技術團隊所重視的技能樹基於專注的產品多少都有些不同，可以透過產品頁面大致了解相關的細節：

[All AWS Products]https://aws.amazon.com/products/

Container / DevOps / Deployment 領域

對於整個 AWS Support，我可能還稱不上非常了解，但如果是 DevOps 和容器技術相關領域的團隊，個人對於該領域小有心得還能分享點東西。

我所在的團隊大部分涵蓋 AWS 服務包含以下：

Elastic Container Service (ECS)
Elastic Kubernetes Service (EKS)
AWS Fargate
Amazon Elastic Container Registry (ECR)
CloudFormation
AWS Batch
AWS AppMesh
AppRunner
CodeCommit
CodeBuild
CodePipeline
CodeDeploy
CodeStar
Cloud Map
CodeArtifact
Cloud Development Kit (CDK)
Amazon Managed Service for Prometheus (AMP)
Elastic Beanstalk
AWS OpsWorks
Artifact
X Ray
DevOps Guru
CodeGuru

目前我的團隊一個人可能可以支持到快 40 個不同的 AWS Service，上述服務基本上都是客戶如果拋出問題來我都會有機會協助。

有鑒於我們團隊也在積極尋求合適的人才，以下是我們團隊十分看重的技術經驗和能力，部分附上一些你可以參考的學習資源：

Linux

Container and Virtualization features
Container Networking
Performance analysis (I/O, process state, CPU, memory)

Kubernetes / Docker

CI/CD

部署策略
基本的版本控制方法
Infrastructure as Code (IaC)

總結

在這篇內容中，簡述了有關 AWS Cloud Support Engineer 必備的技術能力、特質和相關可以參考的學習材料。如果你是正在考慮加入 AWS Cloud Support Engineer 團隊，希望這些內容能夠更加幫助你建立更多認識。

這篇內容也更像是我自己對於 AWS Cloud Support Engineer 技術職位所具備的長遠學習路徑有個基本的指南，並且幫助你思考如何在你的問題中使用具體的案例正確的展現這些能力。

看更多系列文章

References

關於 Cloud Support Engineer 職位的常見問題 - 請問面試過程中會問到多深入呢？ ↩

[AWS] 那些沒寫在文件上的事 - AWS Load Balancer Controller v1 升級至 v2

2023-02-05T00:00:00-06:00

隨著 Amazon EKS 對於 Kubernetes 1.22 的推出，1.21 也預計於 February 15, 2023 在 Amazon EKS 終止支援 ¹，對於 AWS Load Balancer Controller 仍然使用舊版本的用戶來說，勢必面臨需要升級 AWS Load Balancer Controller 至 2.4.1 以上版本的需要。若在現有環境中仍持續運行 AWS Load Balancer Controller v1 (原 ALB Ingress Controller)，此時此刻將可能來到遷移業務需求的高峰。由於 AWS Load Balancer Controller v2 版本帶來了許多的改進，並且與 v1 版本有很大的差異，絕對不是套用一個 YAML 文件就能搞定的事情。

在 AWS Load Balancer Controller 文件中提及了具體的幾個注意事項²，然而，有些情境文件上不見得會全部捕捉 (或是不巧剛好被我遇到)，以下列舉在 AWS Load Balancer Controller 遷移到 v2 版本之前，我個人在協助用戶執行遷移關注到的幾個有趣的問題。

那些沒寫在文件上的事

ELB 資源命名規則改變帶來可能的停機行為

也許有的人發現 v1 版本的控制器升級到 v2 版本控制器的過程，會由 v2 控制器產生一個新的 ELB 資源，並且將相關資源部署到新產生的資源中完成遷移，舊有 v1 建立的 ELB 資源就不會繼續使用。使得完成遷移的過程如果有依賴舊有 ELB 資源的位置 (e.g. fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com) 或是有依賴服務關聯，就要記得去改對應的 DNS 紀錄。在文件上也確實具體提到了這項行為：

The AWS LoadBalancer resource created for your Ingress will be preserved. If migrating from

對於很多用戶來說，這可能不是什麼大問題；但對於已經有許多系統依賴單一 ELB 資源的用戶來說，前期規劃安裝也沒想太多就直接給他上了。導致這樣的升級過程對這部分用戶來說，就像跑了廁所蹲了馬桶，卻仍然還便秘一樣令人不愉快。

於是就有仍然在運行 v1.0.1 版本的客戶們很天才地提了我甚至都沒思考過的升級路徑：

既然文件上是說 ，那是不是我先升級到大於這個版本之後 (例如：v1.1.9)，再升級到 v2 版本就可以保留原有的 ELB 資源了呢？
升級路徑為 v1.0.1 -> v1.1.9 -> v2



邏輯上好像沒有什麼謬誤，但在看過 AWS Load Balancer Controlelr 的原始代碼後，很遺憾，只能說這種想法真的是太美好了。

實際測試行為

根據 AWS Load Balancer Controller v2 版本對應產生的 ELB 資源名稱都存在 k8s 保留字串的不正確觀察，我大可以預料到只要是舊版本的 v1 控制器遷移很可能都無法複用舊有的 ELB 資源。

但針對上述的行為我們可以大膽假設，仍需要仔細驗證一下相關的行為。為求真相，實際在我的環境簡單複製測試後，直接地瓦解了上述的論證。

Step 1：部署 ALB Ingress Controller v1.0.1

首先，我在我的環境運行了 v1.0.1 的控制器，歷經一番考古和 kubectl convert 轉換 (舊有的 API 宣告)，將舊版 v1.0.1 控制器成功安裝到 Kubernetes 1.20 版本中運行，並且部署一個簡單的範例應用：

# Running controller v1.0.1
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.0.1/docs/examples/rbac-role.yaml
$ wget https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.0.1/docs/examples/alb-ingress-controller.yaml
$ kubectl apply -f alb-ingress-controller.yaml
$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o alb-ingress[a-zA-Z0-9-]+)

W0130 12:45:29.518393       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.0.1
  Build:      git-ebac62dd
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------

# Running a sample application
$ kubectl describe ing -n echoserver echoserver
Address:          fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com
...


Step 2：升級至 ALB Ingress Controller v1.1.9

並且直接更新到 v1.1.9 版本，即使 ALB Ingress Controller 存在刷新操作，仍然針對原有的 Ingress 物件保留了原本的部署關聯 (fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com)：

# Upgrade and deploy to v1.1.9

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.9/docs/examples/rbac-role.yaml

# Use kubectl and update the image to "docker.io/amazon/aws-alb-ingress-controller:v1.1.9"
$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o "alb-ingress[a-zA-Z0-9-]+")

W0130 13:05:04.770613       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.1.9
  Build:      6c19d2fb
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------

# ELB name doesn't change
$ kubectl describe ing -n echoserver echoserver
Address:          fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com


Step 3：升級至 AWS Load Balancer Controller v2

在預備升級 v2 的過程 ELB 資源都持續存在，且 Ingress 也同樣關聯舊有的 ELB 資源；然而，一旦 v2 版本一部署下去，新的 ELB 資源立即被建立，並且使用不同的命名 (k8s-echoserv-echoserv-XXXXXXXX-XXXXXXX) 運作，並且關聯的 Ingress 和 Kubernetes Service 均遷移使用新的 ELB 資源：

# Update the controller to v2
# The old ALB Ingress controller has been uninstalled at this moment, and can see the ingress object is still preserved

$ kubectl describe ing -n echoserver echoserver
Address:          fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com
....

$ helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=eks \
  --set serviceAccount.create=false \
  --set serviceAccount.name =aws-load-balancer-controller

# Once v2 controller has been installed, the controller will update the ELB name

$ kubectl describe ing -n echoserver echoserver
Address:          k8s-echoserv-echoserv-XXXXXXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com
...

Events:
  Type    Reason                  Age   From     Message
  ----    ------                  ----  ----     -------
  Normal  SuccessfullyReconciled  11s   ingress  Successfully reconciled


此時，舊有的 ELB 資源 (fe584233-echoserver-echose-XXXX-XXXXXXX.ap-northeast-1.elb.amazonaws.com ) 仍然存在，只是 AWS Load Balancer Controller 並未直接管理及操作該資源，並且不再執行註冊 Kubernetes Service 至相關 Target Group 資源。

行為分析

從上述部署來看，我們可以觀察到 v1 版本的控制器在古時候，使用 namespace + ingress 名稱的方式進行組合命名，這項行為也確實在 v1.0.1 如此呼叫 (Source: L299-L317)，在 v1.1.9 (Source: L285-L304) 亦同。一個 NameLB 短短幾行副程式道盡前人的字串處理之美 (Source: v1.0.1, v1.1.9)：

func (gen *NameGenerator) NameLB(namespace string, ingressName string) string {
  hasher := md5.New()
  _, _ = hasher.Write([]byte(namespace + ingressName))
  hash := hex.EncodeToString(hasher.Sum(nil))[:4]

  r, _ := regexp.Compile("[[:^alnum:]]")
  name := fmt.Sprintf("%s-%s-%s",
    r.ReplaceAllString(gen.ALBNamePrefix, "-"),
    r.ReplaceAllString(namespace, ""),
    r.ReplaceAllString(ingressName, ""),
  )
  if len(name) > 26 {
    name = name[:26]
  }
  name = name + "-" + hash
  return name
}


然而，在 v2 版本除了功能性的改進，針對 ALB Ingress Controller 也確實做了多項程式上的重構。最顯著的就是上述命名的變更，在 v2 版本的命名中存在了更多元的規範 (Source: v2.4.4, L90-L124)：

func (t *defaultModelBuildTask) buildLoadBalancerName(_ context.Context, scheme elbv2model.LoadBalancerScheme) (string, error) {
  ...
  if len(explicitNames) == 1 {
    name, _ := explicitNames.PopAny()
    // The name of the loadbalancer can only have up to 32 characters
    if len(name) > 32 {
      return "", errors.New("load balancer name cannot be longer than 32 characters")
    }
    return name, nil
  }
  if len(explicitNames) > 1 {
    return "", errors.Errorf("conflicting load balancer name: %v", explicitNames)
  }
  uuidHash := sha256.New()
  _, _ = uuidHash.Write([]byte(t.clusterName))
  _, _ = uuidHash.Write([]byte(t.ingGroup.ID.String()))
  _, _ = uuidHash.Write([]byte(scheme))
  uuid := hex.EncodeToString(uuidHash.Sum(nil))

  if t.ingGroup.ID.IsExplicit() {
    payload := invalidLoadBalancerNamePattern.ReplaceAllString(t.ingGroup.ID.Name, "")
    return fmt.Sprintf("k8s-%.17s-%.10s", payload, uuid), nil
  }

  sanitizedNamespace := invalidLoadBalancerNamePattern.ReplaceAllString(t.ingGroup.ID.Namespace, "")
  sanitizedName := invalidLoadBalancerNamePattern.ReplaceAllString(t.ingGroup.ID.Name, "")
  return fmt.Sprintf("k8s-%.8s-%.8s-%.10s", sanitizedNamespace, sanitizedName, uuid), nil
}


除了對於 ELB 命名的檢查更為嚴謹了，也多了以 Cluster 名稱、Ingress Group 和多個關聯的識別資料進行雜湊產生 UUID，最終透過 k8s- 前綴組織成了人見 - 人們不見得愛的正則化命名。

總結

本章用了數百字的篇幅傳遞 v2 版本真的改很多東西，作為總結：

若從 v1 版本遷移至 v2 控制器涉及你目前所遭遇或未來將面臨的情境，則生成新的 ELB 資源都是很有可能且可以預期的行為。

在規劃遷移的同時，若尚未對 ELB 資源存取設計另一層存取介面的情境，考慮透過 DNS 紀錄 (CNAME) 方法管理應對 ELB 資源更新變更的存取位置是一種常見的做法，以降低用戶端因上述升級行為所產生的改動；也建議透過規劃相應的停機時間和更新 DNS 對應紀錄的變更紀錄納入考量。

畢竟，有很多事情只是人生暫時過不去的坎，在這樣的機制下，沒有什麼事情是清一下 DNS 快取以及「請稍候重試」提示訊息不能解決的。

Related Posts


  Best practice for load balancing - 1. Let’s start with an example from Kubernetes document
  Best practice for load balancing - 2. imbalanced problem
  Best practice for load balancing - 3. what controller should I use


References


  
    
      Amazon EKS Kubernetes versions - Amazon EKS Kubernetes release calendar ↩
    
    
      AWS Load Balancer Controller - Migrate from v1 to v2 ↩



[AWS][EKS] Zero downtime deployment(RollingUpdate) when using AWS Load Balancer Controller on Amazon EKS
2022-06-01T00:00:00-05:00
This article is describing the thing you need to aware when using ALB Ingress Controller (AWS Load Balancer Controller) to do deployment and prevent 502 errors.


  
  
    
      Overview

    
  


What’s AWS Load Balancer Controller (legacy ALB Ingress Controller)

Kubernetes doesn’t involve the Application Load Balancer (ALB) deployment in the native implementation for using Kubernetes service object with type=LoadBalancer. Therefore, if you would like to expose your container service with Application Load Balancer (ALB) on EKS, it is recommended to integrate with AWS Load Balancer Controller (In the past, it was ALB Ingress Controller when it firstly initiated by CoreOS and Ticketmaster). This controller make it possible to manage have load balancers with Kubernetes deployment.

Below is showing an overview diagram that describing the controller workflow:


  
  
    
      How AWS Load Balancer Controller works - source

    
  



  (1) The controller watches for ingress events from the API server.
  (2) An ALB (ELBv2) is created in AWS for the new ingress resource. This ALB can be internet-facing or internal.
  (3) Target Groups are created in AWS for each unique Kubernetes service described in the ingress resource.
  (4) Listeners are created for every port detailed in your ingress resource annotations.
  (5) Rules(ELB Listener Rules) are created for each path specified in your ingress resource. This ensures traffic to a specific path is routed to the correct Kubernetes Service.



  Note: AWS ALB Ingress Controller is replaced, while rename it to be “AWS Load Balancer Controller” with several new features coming out. For more detail, please refer the GitHub project - kubernetes-sigs/aws-alb-ingress-controller


How to deploy Kubernetes with AWS Load Balancer Controller?

Using Application Load Balancer as example, when running the controller, AWS Load Balancer Controller will be deployed as a Pod running on your worker node while continously monitor/watch your cluster state. Once there have any request for Ingress  object creation, AWS Load Balancer Controller will help you to manage and create Application Load Balancer resource. Here is a part of example for v1.1.8 deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: alb-ingress-controller
  template:
    metadata:
      labels:
        app.kubernetes.io/name: alb-ingress-controller
    spec:
      containers:
        - name: alb-ingress-controller
          args:
            # Setting the ingress-class flag below ensures that only ingress resources with the
            # annotation kubernetes.io/ingress.class: "alb" are respected by the controller. You may
            # choose any class you'd like for this controller to respect.
            - --ingress-class=alb

            # REQUIRED
            # Name of your cluster. Used when naming resources created
            # by the ALB Ingress Controller, providing distinction between
            # clusters.
            # - --cluster-name=devCluster

            # AWS VPC ID this ingress controller will use to create AWS resources.
            # If unspecified, it will be discovered from ec2metadata.
            # - --aws-vpc-id=vpc-xxxxxx

            # AWS region this ingress controller will operate in.
            # If unspecified, it will be discovered from ec2metadata.
            # List of regions: http://docs.aws.amazon.com/general/latest/gr/rande.html#vpc_region
            # - --aws-region=us-west-1
           image: docker.io/amazon/aws-alb-ingress-controller:v1.1.8
      serviceAccountName: alb-ingress-controller


The deployment basically will run a copy of ALB Ingress Controller (pod/alb-ingress-controller-xxxxxxxx-xxxxx) in kube-system:

NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
kube-system   pod/alb-ingress-controller-5fd8d5d894-8kf7z   1/1     Running   0          28s

NAMESPACE     NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/alb-ingress-controller   1/1     1            1           3m48s


Since v2, the controller added lots of different custom resources and enhancements. But the core deployment still preserve many thing that mentioned in this post. Depending on your environment, the default and suggested installation steps may also involve the configuration of IRSA (IAM Role for Service Account) to grant permission for the AWS Load Balancer Controller Pods in order to operate AWS resources (e.g. ELB), so it is recommended to take a look official documentation to help you quickly understand how to install the controller:


  AWS Load Balancer Controller on Amazon EKS


In addition, the service can be deployed as Ingress Object. For example, if you tried to deploy the simple 2048 application:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.8/docs/examples/2048/2048-namespace.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.8/docs/examples/2048/2048-deployment.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.8/docs/examples/2048/2048-service.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.8/docs/examples/2048/2048-ingress.yaml


The file 2048-ingress.yaml is mentioning the annotations, spec in format that supported by ALB Ingress Controller can recognize (Before Kubernetes 1.18):

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "2048-ingress"
  namespace: "2048-game"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
  labels:
    app: 2048-ingress
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: "service-2048"
              servicePort: 80


Before the IngressClass resource and ingressClassName field were added in Kubernetes 1.18, Ingress classes were specified with a kubernetes.io/ingress.class annotation on the Ingress. So right now, you should see the ingress specification will be defined as below if you are using controller version v2.x:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.1/docs/examples/2048/2048_full.yaml


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80


The ingress object will construct ELB Listeners according rules and forward the connection to the corresponding backend(serviceName), which match the group of service service-2048, any traffic match the rule /* will be routed to the group of selected Pods. In this case, Pods are exposed on the worker node based on type=NodePort:

Here is the definition of this Kubernetes service:

apiVersion: v1
kind: Service
metadata:
  name: "service-2048"
  namespace: "2048-game"
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app: "2048"


So … what’s the problem?

Zero downtime deployment is always a big challenge for DevOps/Operation team when running any kind of business. When you try to apply the controller as a solution to expose your service, it has a couple of things need to take care due to the behavior of Kubernetes, ALB and AWS Load Balancer Controller. To achieve zero downtime, you need to consider many perspectives, some new challenges will also popup when you would like to roll out the new deployment for your Pods with AWS Load Balancer Controller.

Let’s use the 2048 game as example to describe the scenario when you are trying to roll out a new version of your container application. In my environment, I have:


  A Kubernetes service service/service-2048 using NodePort to expose the service
  The deployment also have 5 copy of Pods for 2048 game, which is my backend application waiting for connections forwarding by Application Load Balancer (ALB)


NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
2048-game     pod/2048-deployment-58fb66554b-2f748          1/1     Running   0          53s
2048-game     pod/2048-deployment-58fb66554b-4hz5q          1/1     Running   0          53s
2048-game     pod/2048-deployment-58fb66554b-jdfps          1/1     Running   0          53s
2048-game     pod/2048-deployment-58fb66554b-rlpqm          1/1     Running   0          53s
2048-game     pod/2048-deployment-58fb66554b-s492n          1/1     Running   0          53s

NAMESPACE     NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
2048-game     service/service-2048   NodePort    10.100.53.119           80:30337/TCP    52s

NAMESPACE     NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
2048-game     deployment.apps/2048-deployment          5/5     5            5           53s


And for sure, once the controller correctly set up and provision the ELB resource, the full domain of ELB also will be recorded to the Ingress object:

$ kubectl get ingress -n 2048-game
NAME           HOSTS   ADDRESS                                                                      PORTS   AGE
2048-ingress   *       xxxxxxxx-2048game-xxxxxxxx-xxxx-xxxxxxxxx.ap-northeast-1.elb.amazonaws.com   80      11m


I can use the DNS name as endpoint to visit my container service:

$ curl -s xxxxxxxx-2048game-xxxxxxxx-xxxx-xxxxxxxxx.ap-northeast-1.elb.amazonaws.com | head
<!DOCTYPE html>

<head>
  charset="utf-8">
  2048

  <link href="style/main.css" rel="stylesheet" type="text/css">
  <link rel="shortcut icon" href="favicon.ico">
  ...



  
  
    
      2048 Game deployed with controller

    
  


This application can be any kind of critical service you are running. As a administrator, SRE (Site Reliability Engineer), member of operation team or a DevOps engineer, the goal and your duty is: we always try to ensure the service can run properly without any issue and no interruption (Sometimes it means good sleep). That’s why people really gets hand dirty and maintain the regular operation usually don’t like to adopt service change, because it generally means unstable.

No matter you don’t want to change, with any new business requests, you still can face the challenges like: your developers are saying that “Oh! we need to upgrade the application”, “we are going to roll out a bug fix”, “the new feature is going to be online”, no one can one hundred percent guarantees the service can run properly if any changes applied, because system usually has its limitation and trade-off. Any service downtime can lead anyone of stakeholders(users, operation team or leadership) unhappy.

However, the question is that can we better to address these problem once we know the limitation and its behavior? Some people in Taiwan will also consider to put Kuai Kuai on the workstation because they believe it can make service happy, but I am not very obsessed with this method, so in the following section I will try to walk through more realistic logic and phenomena by using the 2048 game as my sample service.

I am going to use a simple loop trick to continously access my service via the endpoint xxxxxxxx-2048game-xxxxxxxx-xxxx-xxxxxxxxx.ap-northeast-1.elb.amazonaws.com to demonstrate a scenario: This is a popular web service and we always have customer need to access it. (e.g. social media platform, bitcoin trading platform or any else, we basically have zero tolerance for any service downtime as it can impact our revenue.), as below:

$ while true;do ./request-my-service.sh; sleep 0.1; done
HTTPCode=200_TotalTime=0.010038
HTTPCode=200_TotalTime=0.012131
HTTPCode=200_TotalTime=0.005366
HTTPCode=200_TotalTime=0.010119
HTTPCode=200_TotalTime=0.012066
HTTPCode=200_TotalTime=0.005451
HTTPCode=200_TotalTime=0.010006
HTTPCode=200_TotalTime=0.012084
HTTPCode=200_TotalTime=0.005598
HTTPCode=200_TotalTime=0.010086
HTTPCode=200_TotalTime=0.012162
HTTPCode=200_TotalTime=0.005278
HTTPCode=200_TotalTime=0.010326
HTTPCode=200_TotalTime=0.012193
HTTPCode=200_TotalTime=0.005347
...


Meanwhile, I am using RollingUpdate strategy in my Kubernetes deployment strategy with maxUnavailable=25%, which means, when Kubernetes need to update or patch(Like update the image or environment variables), the maximum number of unavailable Pods cannot exceed over 25% as well as it ensures that at least 75% of the desired number of Pods are up (only replace 1-2 Pods if I have 5 copies at the same time):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: 2048-deployment
  namespace: 2048-game
spec:
  ...
  selector:
    matchLabels:
      app: "2048"
  ...
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate


Scenario: Rolling the new container image to existing container application with potential service downtime

When rolling the new version of my container application (for example, I update my deployment by replacing the container image with the new image nginx), it potentially can have a period of time that can return HTTP Status Code 502 error in my few hits:


  
  
    
      The HTTP 502 Error response from ELB during the rolling update deployment (instance mode)

    
  


If you are specifying the controller to use instance mode to register targets(Pods) to your ELB Target Group, it will use worker nodes’ instance ID and expose your service in that ELB target group with Kubernetes NodePort. In this case, the traffic will follow the Kubernetes networking design to do second tier of transmission according to externalTrafficPolicy defined in the Kubernetes Service object (No matter using externalTrafficPolicy=Cluster or externalTrafficPolicy=Local).

Due to the controller only care about to register Worker Node to the ELB target group, so if the scenario doesn’t involve the worker node replacement, the case basically have miniumun even no downtime(expect that it is rare to have downtime if the Kubernetes can perfectly handle the traffic forwarding); however, this is not how real world operate, few seconds downtime still can happen potentially due to the workflow below:

This is the general workflow when the client reach out to the service endpoint (ELB) and how was traffic goes

Client ----> ELB ----> Worker Node (iptables) / In this step it might be forwarded to other Worker Node ----> Pod


So, in these cases, you can see the downtime:


  (1) The client established the connection with ELB, ELB is trying to forward the request to the backend (the Worker Node), but the Worker Node is not ready to serve the Pod.
  (2) Follow the iptables rules, the traffic be forwarded to the Pod just terminated due to RollingUpdate (Or the Pod just got the in-flight reqeust but immediately need to be terminated, the Pod flip to Terminating state. It haven’t response back yet, caused the ELB doesn’t get the response from Pod.)
  (3) ELB established connection with the Worker Node-1, once the packet enter into the Worker Node-1, it follows the iptables then forward it to the Pod running on Worker Node-2 (jump out the current worker node), however, the Worker Node-2 just got terminated due to auto scaling strategy or any replacement due to upgrade, caused the connection lost.


Let’s say if you try to remove the encapsulation layer of the Kubernetes networking design and make thing more easier based on the AWS supported CNI Plugin (Only rely on the ELB to forward the traffic to the Pod directly by using IP mode with annotation setting alb.ingress.kubernetes.io/target-type: ip in my Ingress object), you can see the downtime more obvious when Pod doing RollingUpdate. That’s because not only the problem we mentioned the issues in case (1)/(2)/(3), but also there has different topic on the behavior of the controller need to be covered if the question comes to zero downtime deployment:

Here is an example by using IP mode (alb.ingress.kubernetes.io/target-type: ip) as resgistration type to route traffic directly to the Pod IP

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80



  
  
    
      An example when using IP mode in AWS Load Balancer Controller - Can see my Pods all are registering with Pod owns IP address

    
  


Again follow the issue we mentioned (1) (2) (3), when doing the rolling update (I was replacing the image again in IP mode), similar problem can be observed. Potentially, you can have 10-15 seconds even longer downtime can be noticed if you are doing the same lab:


  
  
    
      The HTTP 502 Error response from ELB during the rolling update deployment (IP mode)

    
  


When Kubernetes is rolling the deployment, in the target group, you will see AWS Load Balancer Controller was issuing old targets draining process(Old Pods) in the meantime


  
  
    
      Old targets were going to be draining state in target group

    
  


However, you still can see HTTP 502/504 errors exceed 3-10 seconds for a single requset

HTTPCode=200_TotalTime=0.005413
2048
HTTPCode=200_TotalTime=0.009980
502 Bad Gateway
HTTPCode=502_TotalTime=3.076954
2048
HTTPCode=200_TotalTime=0.005700
2048
HTTPCode=200_TotalTime=0.010019
502 Bad Gateway
HTTPCode=502_TotalTime=3.081601
2048
HTTPCode=200_TotalTime=0.005527
502 Bad Gateway
HTTPCode=502_TotalTime=3.070947
502 Bad Gateway
HTTPCode=502_TotalTime=3.187812
504 Gateway Time-out
HTTPCode=504_TotalTime=10.006324
Welcome to nginx!
HTTPCode=200_TotalTime=0.011838
Welcome to nginx!


The issue and the workflow of the AWS Load Balancer Controller

Let’s use this scenario as it is a edge problem we need to consider for most use case. The issue generally is bringing out the core topic we want to address and giving a good entry point to dive deep into the workflow between the Kubernetes, AWS Load Balancer Controller and the ELB, which can lead HTTP status code 502/503(5xx) erros during deployment when having Pod termination.

Before diving into it, we need to know when a pod is being replaced, AWS Load Balancer Controller will register the new pod in the target group and removes the old Pods. However, at the same time:


  For the the new Pods, the target is in initial state, until it pass the defined health check threshold (ALB health check)
  For the old Pods is remaining as draining state, until it completes draining action for the in-flight connection, or reaching out the Deregistration delay defined in the target group.


Which result in the service to be unavailable and return HTTP 502.

To better understand that, I made the following diagrams. It might be helpful to you understanding the workflow:

1) In the diagram, I used the following IP addresses to remark and help you recognize new/old Pods. Here is the initial deployment.


  Old Pods: Target-1(Private IP: 10.1.1.1), Target-2(Private IP: 10.2.2.2)
  New Pods: Target-3(Private IP: 10.3.3.3), Target-4(Private IP: 10.4.4.4)



  
  
    
      Deployment workflow of AWS Load Balancer Controller - 1. the initial deployment

    
  


2) At this stage, I was doing container image update and start rolling out the new copies of Pods. In the meantime, the controller will make RegisterTarget API call to ELB on behalf of the Kubernetes.


  
  
    
      Deployment workflow of AWS Load Balancer Controller - 2. start rolling out the new copies of Pods and AWS Load Balancer Controller is going to issue RegisterTarget API call

    
  


3) Meanwhile, the DeregisterTarget API will be called by AWS Load Balancer Controller and new targets are in initial state.


  
  
    
      Deployment workflow of AWS Load Balancer Controller - 3. AWS Load Balancer Controller start to dereigster old targets on ELB Target Group

    
  


4) At this stage, anything could happen to cause service outage. Because the DeregisterTarget API call might take some time to process, but, Kubernetes doesn’t have any design to monitor the current state of the ELB Target Group, it only care about rolling the new version of Pods and terminate old one.

In this case, if the Pod got terminated by Kubernetes but Target-1 or Target-2 are still leaving in the ELB Target Group as Active/Healthy state (It need to wait few seconds to be Unhealthy once it reach out to the threshold of ELB HTTP health check), result in the ELB cannot forward the front-end request to the backend correctly.


  
  
    
      Deployment workflow of AWS Load Balancer Controller - 4. Note: issue cause by inconsistent state between Kubernetes and ELB

    
  


5) ELB received the DeregisterTarget request. So the ELB Target Group will start to perform connection draining(set old targets as draining), and mark the Target-1/Target-2 as draining state, any new connection won’t be routed to these old targets.


  
  
    
      Deployment workflow of AWS Load Balancer Controller - 5. ELB start to perform connection draining for old targets

    
  


6) However, here brings another issue: if the new targets (Target-3 and Target-4) are still working on passing the health check of ELB(Currently those are in Initial state), there has no backend can provide service at this moment, which can cause the ELB only can return HTTTP 5XX status code


  
  
    
      Deployment workflow - 6. ELB response HTTP 5XX error due to no healthy targets in can provide service

    
  


7) Until the new Pods is in Running state as well as can react the health check reqeust from ELB through HTTP/HTTPS protocol, the ELB end up mark the targets as Active/Healthy and the service become available


  
  
    
      Deployment workflow - 7. The service need to wait a period to recover until new targets passed the ELB health check

    
  


How to resolve the issue and meet zero-downtime?

Factor-1: Pod Readiness Gates

Since version v1.1.6, AWS Load Balancer Controller (ALB Ingress Controller) introduced Pod readiness gates. This feature can monitor the rolling deployment state and trigger the deployment pause due to any unexpected issue(such as: getting timeout error for AWS APIs), which guarantees you always have Pods in the Target Group even having issue on calling ELB APIs when doing rolling update.

ALB Ingress Controller 1.x (Legacy)

As mentioned in the previous workflow, obviously, if you would like to prevent the downtime, it is required to use several workarounds to ensure the Pod state consistency between ALB, ALB Ingress Controller and Kubernetes.

In the past, the readiness gate can be configured with legacy (version 1) by using the following pod spec. Here is an example to add a readiness gate with conditionType: target-health.alb.ingress.k8s.aws/__

(As it might be changed afterward, for more detail, please refer to the documentation as mentioned in the AWS Load Balancer Controller project on GitHub):

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  clusterIP: None
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internal
spec:
  rules:
    - http:
        paths:
          - backend:
              serviceName: nginx-service
              servicePort: 80
            path: /*
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      readinessGates:
      - conditionType: target-health.alb.ingress.k8s.aws/nginx-ingress_nginx-service_80
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80


AWS Load Balancer Controller (After v2.x)

For now, if you are using controller later than v2, the readiness gate configuration can be automatically injected to the pod spec by defining the label elbv2.k8s.aws/pod-readiness-gate-inject: enabled to your Kubernetes namespace.

$ kubectl create namespace readiness
namespace/readiness created

$ kubectl label namespace readiness elbv2.k8s.aws/pod-readiness-gate-inject=enabled
namespace/readiness labeled

$ kubectl describe namespace readiness
Name:         readiness
Labels:       elbv2.k8s.aws/pod-readiness-gate-inject=enabled
Annotations:  
Status:       Active


So defining legacy fields readinessGates and conditionType are not required if you are using controller later than v2.0. If you have a pod spec with legacy readiness gate configuration, ensure you label the namespace and create the Service/Ingress objects before applying the pod/deployment manifest. The controller will remove all legacy readiness-gate configuration and add new ones during pod creation.

Factor-2: Graceful shutdown your applications

For existing connections(As mentioned in the workflow-4), the case is involving the gracefully shutdown/termination handling in Kubernetes. Therefore, it is requires to use the method provided by Kubernetes.

You can use Pod Lifecycle with preStop hook and make some pause(like using sleep command) for Pod termination. This trick ensures ALB can have some time to completely remove old targets on Target Group (It is recommended to adjust longer based on your Deregistration delay):

    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 40"]
  terminationGracePeriodSeconds: 70



  Note: If a container has a preStop hook configured, that runs before the container enters the Terminated state. Also, if the preStop hook needs longer to complete than the default grace period allows, you must modify terminationGracePeriodSeconds to suit this.


An example to achieve zero downtime when doing rolling update after applying methods above

First apply the label to the namespace so the controller can automatically inject the readiness gate:

apiVersion: v1
kind: Namespace
metadata:
  name: 2048-game
  labels:
    elbv2.k8s.aws/pod-readiness-gate-inject: enabled


apiVersion: apps/v1
kind: Deployment
metadata:
  name: "2048-deployment"
  namespace: "2048-game"
spec:
  selector:
    matchLabels:
      app: "2048"
  replicas: 5
  template:
    metadata:
      labels:
        app: "2048"
    spec:

      # This would be optional if you are using controller after v2.x
      readinessGates:
      - conditionType: target-health.alb.ingress.k8s.aws/2048-ingress_service-2048_80

      terminationGracePeriodSeconds: 70
      containers:
      - image: alexwhen/docker-2048
        imagePullPolicy: Always
        name: "2048"
        ports:
        - containerPort: 80
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 40"]


Here is an example after following the practice I was getting a try. The deployment will apply the feature and can see the status of the readiness gates:

$ kubectl get pods -n 2048-game -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP               NODE                                              NOMINATED NODE   READINESS GATES
2048-deployment-99b6fb474-c97ht   1/1     Running   0          78s   192.168.14.209   XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.compute.internal              1/1
2048-deployment-99b6fb474-dcxfs   1/1     Running   0          78s   192.168.31.47    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.compute.internal              1/1
2048-deployment-99b6fb474-kvhhh   1/1     Running   0          54s   192.168.29.6     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.compute.internal              1/1
2048-deployment-99b6fb474-vhjbg   1/1     Running   0          54s   192.168.18.161   XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.compute.internal              1/1
2048-deployment-99b6fb474-xfd5q   1/1     Running   0          78s   192.168.16.183   XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.compute.internal              1/1


Once rolling the new version of the container image, the deployment goes smoothly and prevent the downtime issue as mentioned in previous paragraphs:


  
  
    
      Zero downtime with AWS Load Balancer Controller - Can see the targets are gracefully replaced when the Kubernetes is doing rolling update

    
  


In my scenario, the Kubernetes need to take at least 40 seconds termination period for single Pod, so the old targets are gradually moved out instead of remove all of them at once within few seconds, until entire target group only exists new targets.

Therefore, you probably also need to notice the Deregistration delay defined in your ELB Target Group, which can be updated through the annotation:

alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30


In this case, it is recommended to be less than 40 seconds so ELB can drain your old targets before the Pod completely shutdown.

With the configuration, client can get normal responses from old Pods/existing connection during the deployment:

HTTPCode=200_TotalTime=0.012028
2048
HTTPCode=200_TotalTime=0.005383
2048
HTTPCode=200_TotalTime=0.010174
2048
HTTPCode=200_TotalTime=0.012233
Welcome to nginx!
HTTPCode=200_TotalTime=0.007116
2048
HTTPCode=200_TotalTime=0.010090
2048
HTTPCode=200_TotalTime=0.012201
2048
HTTPCode=200_TotalTime=0.005532
2048
HTTPCode=200_TotalTime=0.010107
2048
HTTPCode=200_TotalTime=0.012163
Welcome to nginx!
HTTPCode=200_TotalTime=0.005452
Welcome to nginx!
HTTPCode=200_TotalTime=0.009950
2048
HTTPCode=200_TotalTime=0.012082
Welcome to nginx!
HTTPCode=200_TotalTime=0.005349
2048
HTTPCode=200_TotalTime=0.010142
2048
HTTPCode=200_TotalTime=0.012143
2048
HTTPCode=200_TotalTime=0.005507

...
HTTPCode=200_TotalTime=0.012149
Welcome to nginx!
HTTPCode=200_TotalTime=0.005364
Welcome to nginx!
HTTPCode=200_TotalTime=0.010021
Welcome to nginx!
HTTPCode=200_TotalTime=0.012092
Welcome to nginx!
HTTPCode=200_TotalTime=0.005463
Welcome to nginx!
HTTPCode=200_TotalTime=0.010136
Welcome to nginx!


This is the practice in case having AWS Load Balancer Controller for doing graceful deployment with RollingUpdate. However, it is another big topic need to be discussed regarding what type of the application when rolling the update. Because other type of applications need to establish long connection with the ELB or have requirement for considering persistence data need to be stored on the backend. All these things can bring out other issues we need to talk about.

But in summarize, with the deployment strategy above, it is also recommended to design the client/backend application as stateless, implement retry and fault-tolerance. These mothod usually help to reduce the customer complain and provide better user experience for most common use case.

Conclusion

Due to the current design of Kubernetes, it is involving the state inconsistent issue when you are exposing the service with Application Load Balancer. Therefore, in this article, I mentioned the potential issue when doing rolling update in the scenario having container service integrating with the AWS Load Balancer Controller (ALB Ingress Controller).

Even the technology is always in revolution, I am still willing to help people better handle the deployment strategy. I used a couple of hours to draft this content and tried to cover several major issues, metioned things you might need to aware, break down the entire workflow and shared few practical suggestions that can be achieved by using AWS Load Balancer Controller in order to meet the goal when doing zero downtime deployment.

The article was written based on my own experience (Of course many communications back and forth with different customers using AWS), it might not be perfect but I hope it is helpful to you. For sure, if you find any typo or have any suggestions, please feel free and leave comment below.

References


  ALB Ingress Controller on Amazon EKS
  Using pod conditions / pod readiness gates
  Issue#1124
  Issue#814
  Issue#1064



在 AWS 上打造 Serverless (無伺服器) 雲端綠界支付金流 (ECPay)
2022-05-27T00:00:00-05:00
金流系統一直是許多線上服務、電商所必備的功能之一；常見的使用方式往往是透過第三方金流服務提供商對應的模組實現 (例如：OpenCart, WooCommerce)，以串接金流系統實現購物結帳的機制。

即使是設計自己的金流系統串接，想要將這些金流服務串接應用在雲端服務上，對於不熟悉使用雲端技術的用戶來說，仍然需要花一些時間摸索以達到這項目的。

為了實現在 AWS 上串接綠界金流 (ECPay) 提供信用卡付款機制，並且簡化管理和維護流程，以下內容以 Serverless 技術作為背景，簡介在 AWS 上實作的相關細節。

什麼是 Serverless？為什麼要用 Serverless 技術？

無伺服器運算 (Serverless) 概念提出了拋棄舊有傳統管理 Server ，在過去，你需要維護及管理運行你應用程式的基礎運算系統；Serverless 提出以平台即服務（PaaS）的運作模式，提供簡單且容易操作的微型的架構，使得你不需要部署、組態或管理伺服器，只需要運用 Serverless 相關的解決方案，將你的程式碼推送至相關平台，運行所需要的伺服器服務皆由雲端平台來提供。

自 2014 年 AWS 推出 Serverless 服務以來，已經儼然成為一項 IT 部署解決方案中熱門的運行架構；學會使用 Serverles，將幫助你更容易且快速地推行不同類型的應用，將你的想法付諸於實際實現。

在過去，如果要運作相關的金流服務，我可能會需要開啟一台虛擬機器 24 小時的提供商業邏輯的運作，並且，可能會因為一些非預期狀況而多許多而外的工作，例如：突然暴增的訂單請求、過高的使用負載等原因影響到業務。往往程式開發出來只是個起點，後面的系統維護工作才是更大的挑戰。

選擇使用 Serverless 架構設計的考量之一，便是考量部署、組態或管理伺服器的長遠維護性，尤其對於金流這種關鍵業務來說，更是至關重要。

怎麼在 AWS 上與綠界支付科技串接

一般來說，在 AWS 上要建立 Serverless 為基礎架構的應用程式，通常涉及幾種不同的關鍵服務；以這項支付金流系統為例，我採用了以下的 AWS 服務

AWS Lambda

是一種無伺服器的運算服務，可讓您執行程式但不必佈建或管理伺服器、建立工作負載感知叢集擴展邏輯、維護事件整合或管理執行階段。使用 Lambda，您可以透過虛擬方式執行任何類型的應用程式或後端服務，全部無需管理。在這篇內容中，我使用了 Lambda Function 以推送訊息至 Amazon SNS 以發佈檔案更新。 Amazon CloudWatch

API Gateway

為 AWS 提供的託管服務，可以讓開發人員輕鬆地建立、發佈、維護、監控和保護任何規模的 API。API 可作為應用程式的「前門」，以便從後端服務存取資料、商業邏輯或功能。使用 API Gateway 時，您可以建立 RESTful API 等應用程式。API Gateway 支援無伺服器工作負載和 Web 應用程式。API Gateway 可以用以負責處理有關接受和處理多達數十萬個並行 API 呼叫的所有工作，包括流量管理、CORS 支援、授權和存取控制。API Gateway 沒有最低費用或啟動成本。您要為收到的 API 呼叫和資料傳輸量支付費用。

Serverless Application Model (SAM)

為了更容易實現在 AWS 上運作串接綠界金流支付並且以 Serverless 架構運作的目標，我使用了 AWS Serverless Application Model (簡稱為 SAM) 為開發流程的重要工具，用以建置 Serverless 應用服務。

Serverless Application Model (SAM) 提供了一系列以簡易描述的方法，提供你用以更容易，在很多情況下，你可能無需非常熟悉不同 AWS 服務的設置，即可透過 SAM 建立無伺服器應用程式。

為了幫助你快速了解 Serverless Application Model (SAM) 的運作機制以及簡介，以下簡短 10 分鐘的影片分享了其運作流程的機制：




    
  

運作流程與架構概覽 (Architecture Overview)


  
  
    
      以綠界支付科技為基礎的 Serverless 雲端金流概覽 (建立訂單)

    
  


上述的使用者流程描述了用戶及各個彼此 AWS 服務之間的運作關係，以建立訂單為例，我們可以藉由綠界支付 (ECPay) 開放的對應 SDK 實際在 AWS 中設計屬於其結帳流程的相關操作，透過 API Gateway 建立一致的對外 API 接口，並且實作建立訂單訊息的 Python 應用程式，並且將其透過 Serverless Application Model 提供的 CLI 工具 (SAM CLI) 將應用部署至 AWS Lambda 上運作。

在這種運作架構下，我們只需要專注設計結帳流程和用戶流程的設計，其餘的服務運作機制，均可以交付由 AWS Serverless 相關的解決方案滿足業務實作需求。

如果你對於具體的實作內容有興趣，可以透過下列的連結獲取更多資訊：


學習在一天內於 AWS 雲端搭建 Serverless 架構的金流服務

從 Zero 到 Hero，學習 AWS 入門知識並深入了解、應用 Serverless 相關的服務及架構，同時學會在 AWS 上使用不同的解決方案實踐無伺服器技術 (Serverless)，運行屬於你的雲端金流系統

點擊獲取更多資訊
延伸學習


總結與延伸學習

本篇內容簡介了以 AWS Serverless 為基礎架構設計金流應用程式的實作流程，以及提及部分在 AWS 上實現串接綠界支付科技的對應機制，並且分享一項參考架構。如果你對於本篇具體的實作內容有興趣，可以利用以下連結獲取完整的內容：


  打造 Serverless (無伺服器) 雲端金流


如果你覺得這樣的內容有幫助，可以在底下按個 Like / 留言讓我知道。


[AWS][EKS] Best practice for load balancing - 3. what controller should I use
2022-04-08T00:00:00-05:00
This article is sharing the best practice for doing load balancing on Amazon EKS, learn what is advantage and disadvantage of using different controller. We will discuss what controller should you use.

Compare different controller options

Here are some common load balancing solutions that can be applied on Amazon EKS:

Kubernetes in-tree load balancer controller

This is the easiest way to provision your Elastic Load Balancer resource, which could be done by using default Kubernetes service deployment with type: LoadBalancer. In most case, the in-tree controller can quickly spin up the load balancer for experiment purpose; or, offers production workload.

However, you need to aware the problem as we mentioned in the previous posts ¹ ² because it generally can add a hop for your load balancing behavior on AWS and also can increase the complexity for your traffic.

In addition, you need to aware this method only applies for creating Classic Load Balancer and Network Load Balancer (by using annotation ³).

nginx ingress controller

If you are using nginx Ingress controller in AWS, it will deploy Network load balancer (NLB) to expose the NGINX Ingress controller behind a Service of type=LoadBalancer. Here is an example for deploying Kubernetes service of nginx Ingress controller 1.1.3:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.3
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  externalTrafficPolicy: Local
  ports:
  - appProtocol: http
    name: http
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer


Guess what, yes, it is still can rely on the in-tree controller. On the other hand, the problem we were mentioning can persist. It can be hard to expect which Pods will receive the traffic; however, the main issue is that an Ingress controller does not typically eliminate the need for an external load balancer, it simply adds an additional layer of routing and control behind the load balancer.


  
  
    
      Figure 1. An architecture overview of using nginx Ingress controller

    
  


So why to choose Nginx Ingress controller? It probably can be the reason why as mentioned in the post ⁴ as mentioned on the AWS Blog:


  By default, the NGINX Ingress controller will listen to all the ingress events from all the namespaces and add corresponding directives and rules into the NGINX configuration file. This makes it possible to use a centralized routing file which includes all the ingress rules, hosts, and paths.
  With the NGINX Ingress controller you can also have multiple ingress objects for multiple environments or namespaces with the same network load balancer.


AWS Load Balancer Controller

AWS Load Balancer Controller is similar to the in-tree Kubernetes controller and use native AWS APIs to provision and manage Elastic Load Balancers. The controller was an open-source project originally named ALB Ingress Controller because it was only provides capability to manage Application Load Balancer at the intial stage, lately, it officially renamed as AWS Load Balancer Controller ⁵, which is maintaining by AWS product team and open-source community.

Unlike in-tree Kubernetes controller needs to wait the upstream code to be updated, which requires you to upgrade Kubernetes control plane version if the controller has any bug or any new ELB features need to be supported. Using AWS Load Balancer Controller, it can gracefully be replaced because it will be running as Kubernetes deployment instead of relying on Kubernetes upstream source code integration.

The controller directly maintain your Elastic Load Balancer resources with up-to-date annotations. For nginx ingress controller, it can provision and add an extra load balancing layer with the Network Load Balancer, in this case, the traffic generally will pass through the controller itself (nginx-ingress); instead, for AWS Load Balancer Controller, it doesn’t play as a gateway. The AWS Load Balancer Controller will directly control the Elastic Load Balancer resource, which can register your Pod (by using IP mode) so the request can directly forward to your backend application.

The AWS Load Balancer Controller also starts to support TargetGroupBinding ⁶ and IngressGroup ⁷ feature since v2.2. It enables you can group multiple Ingress resources together, which allows multiple service deployments can share the same Elastic Load Balancer resource.

Conclusion: What controller should I use?

After comparing different load balancer controllers, generally speaking, using AWS Load Balancer basically can have better feature supports as well as adopt with the performance optimization by configuring AWS Load Balancer attributes correctly. It is essential to enable IP mode when applying the Kubernetes service deployment with AWS Load Balancer Controller to reduce unnecessary hop that can be caused by Kubernetes networking itself, which is generally not totally suitable for AWS networking and elastic load balancing feature.

However, the disadvantage of using AWS Load Balancer can be all features require to be supported by Elastic Load Balancer itself because the controller doesn’t involve additional functions to extend the traffic control. Using other controller still can have its benefit and provide different features that Elastic Load Balancer doesn’t have, such as using nginx Ingress controller you may be able to define forward service to external FastCGI targets, using Regular Expression to perform path matching … etc.

By the end of this article, I hope the comparison and information can better help you understand how to select load balancer controller that will be running in Amazon EKS, and choose the right option for your environment.

Thanks for reading! If you have any feedback or opinions, please feel free to leave the comment below.


  Best practice for load balancing - 1. Let’s start with an example from Kubernetes document
  Best practice for load balancing - 2. imbalanced problem
  Best practice for load balancing - 3. what controller should I use


References


  
    
      [AWS][EKS] Best pratice load balancing - Let’s start with an example from Kubernetes document ↩
    
    
      [AWS][EKS] Best pratice load balancing - imbalanced problem ↩
    
    
      in-tree controller - Network Load Balancer support on AWS ↩
    
    
      Using a Network Load Balancer with the NGINX Ingress Controller on Amazon EKS ↩
    
    
      Introducing the AWS Load Balancer Controller ↩
    
    
      AWS Load Balancer controller v2.2 - TargetGroupBinding ↩
    
    
      AWS Load Balancer controller v2.2 - IngressGroup ↩
    
  



[AWS][EKS] Best practice for load balancing - 2. imbalanced problem
2022-04-08T00:00:00-05:00
This article is sharing the best practice for doing load balancing on Amazon EKS, learn what is advantage and disadvantage of using different controller. We will discuss more detail about the imbalanced problem after applying controller to deploy the Elastic Load Balancer.

The load imbalanced problem

Follow the example as mentioned in the previous article, if you deployed a Kubernetes service and noticed the utilization on your backend application is not balanced; or, if you are using AWS Load Balancer controller, Traefik, nginx-ingress controller by finding the Elastic Load Balancer wasn’t correctly separate the loads (when using instance mode to register your Pods as targets), and you may find the imbalanced traffic, that’s the major topic in this article would like to talk about: discuss how to improve and optimize it.

Problem description

Let’s say if I am deploying 4 Pods in my Kubernetes cluster, which is using the default deployment as mentioned below to expose my Kubernetes service:

$ kubectl get pod -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP
nginx-deployment-594764c789-5s668   1/1     Running   0          30m   192.168.42.171
nginx-deployment-594764c789-9k949   1/1     Running   0          30m   192.168.39.194
nginx-deployment-594764c789-b292m   1/1     Running   0          33m   192.168.29.24 
nginx-deployment-594764c789-s226c   1/1     Running   0          30m   192.168.15.158


The Kubernetes service:

apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: nginx


To better understand the problem I am describing in this post, the application I deployed will response Pod IP address to let us know which one received the request:


  
  
    
      Figure 1. Testing the service and see the response from the backend.

    
  


After running a loop and making at least 79 HTTP requests in my test, I get the following response to know how the load has been distributed:


  192.168.42.171: 12 times
  192.168.39.194: 33 times
  192.168.29.24: 23 times
  192.168.15.158: 10 times


According to the testing, we can see the load is not very evenly distributed.

Why this could happen?

As mentioned in the previous post, whether you are defining externalTrafficPolicy=Cluster or externalTrafficPolicy=Local, the routing behavior is relying on iptables(or ipvs) can be unpredictable. Because it is doing second layer of load balancing, which is totally unnecessary for increasing a hop in AWS VPC.

Elastic Load Balancer in AWS already provides a straightforward solution to balance your loads, and its algorithm will try to distribute the requests to all backend servers as even as possible. Doing load balancing in Kubernetes network generally is increasing the complexity of your architecture, and make traffic can be hard to trace; or, even worse, cause the imbalanced issue as you can observe.

This also makes the load balancing became unpredictable. Although the traffic send to the registered EC2 instance can be evenly distributed; however, it doesn’t mean the load can be separated to Pods as well. You will never know which Pods will be routed due to this load balancing layer implemented by Kubernetes networking.

No matter choose Traefik, nginx-ingress, if you are still following the default load balancing pattern offered by upstream Kubernetes code, then you can expect the traffic can come with load imbalanced.

How to optimize the load balancing?

The major problem is the default load balancing behavior can involve the Kubernetes load balancing and add a hop for the traffic. So you may start to wondering how to better resolve this problem; however, there is no specific feature can be adjusted on Kubernetes to remove the default load balancing, but it still could be possible to skip the Kubernetes load balancing and forward the traffic to Pods directly.

If you are running Pods on Amazon EKS and using default AWS VPC CNI Plugin¹, you can expect your Pods should have dedicated secondary private IP address that can be communicated within your AWS VPC network; therefore, it also means that the IP address can be registered to your Elastic Load Balancer as backend target. The flow can be:

Client -> NLB (forawrd request to IP target) -> Pod IPs (Reach out to Pods directly)


For Application Load Balancer (ALB) and Network Load Balancer (NLB), both provide a feature that you can register backend targets with IP addresses (NLB, ALB. Note: Classic Load Balancer doesn’t offer this option). We can simply to associate these Pod IP addresses as backend targets instead of using instances. As long as the Pod IP addresses are reachable, it can move the request be forwarded to the backend Pods by skipping the Kubernetes load balancing behavior.

Using IP mode

So how to register Pod IP addresses in Elastic Load Balancer? A seamlessly way is to deploy your Kubernetes service and use AWS Load Balancer Controller² to enable this feature. Instead of using the default Kubernetes controller to deploy your Elastic Load Balancer, using AWS Load Balancer Controller helps you manage load balancer resource including all functionality features and different type of load balancer such as NLB, ALB, both are can be supported by the controller. After installing the AWS Load Balancer on your EKS cluster, you can enable the IP registration type for your Pods by simply adding annotations to the deployment manifests.

Network Load Balancer (NLB)

Here is a deployment sample that use IP targets with pods deployed to Amazon EC2 nodes. Your Kubernetes service must be created as type LoadBalancer:

apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
  ...
spec:
  type: LoadBalancer
  ...


Application Load Balancer (ALB)

To deploy application load balancer on Amazon EKS through the AWS Load Balancer Controller, you generally will create an Ingress object in your deployment. With the AWS Load Balancer Controller, it also provides supported annotation that can register pods as targets for the ALB. Traffic reaching the ALB is directly routed to pods for your service. Here is an example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    ...


In the AWS EKS documentation, it also mentioned detailed guide regarding how to deploy these two load balancers and share an example by using IP target to register your Pods. If you are interested to learn more, please check out to the following documents to get more detail:


  Network load balancing on Amazon EKS
  Application load balancing on Amazon EKS


By using IP mode, it totally removes the layer of load balancing manipulated by Kubernetes. This generally forward requests to the Pods without doing second forwarding:


  
  
    
      Figure 2. Register Pods with IP mode

    
  


Are you sure it is balanced? Let’s have a test!

This time I used the same testing strategy as mentioned in the first problem description section and ran four Pods associated with Network Load Balancer using IP mode, which is showing below:

$ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP               NODE                                              NOMINATED NODE   READINESS GATES
nginx-deployment-75d48f6698-b5fm7   1/1     Running   0          35m     192.168.17.15    ip-192-168-5-38.ap-northeast-1.compute.internal              
nginx-deployment-75d48f6698-l4gw5   1/1     Running   0          2m45s   192.168.27.143   ip-192-168-5-38.ap-northeast-1.compute.internal              
nginx-deployment-75d48f6698-q2q57   1/1     Running   0          41m     192.168.22.126   ip-192-168-5-38.ap-northeast-1.compute.internal              
nginx-deployment-75d48f6698-x5m25   1/1     Running   0          2m45s   192.168.14.48    ip-192-168-5-38.ap-northeast-1.compute.internal              


After passing at least 50 requests, I can see the request distributions are showing below:


  192.168.17.15: 10 times
  192.168.27.143: 12 times
  192.168.22.126: 14 times
  192.168.14.48: 13 times


For each target, it nearly have ~25% chances will be routed evenly by the Network Load Balancer. Because it skip the load balancing layer of the Kubernetes, it will follow the routing algorithm³ and separate load evenly as we expected.

I tried to use IP mode but the traffic still get imbalanced

In my testing, I was running a couple of Pods with nginx image and provided simple web server in my backend. The scenario in this article mentioning generally is describing all targets were using stateless HTTP connections. However, in some cases, it could be possible ELB might unequally route traffic to your targets if:


  Clients are routing requests to an incorrect IP address of a load balancer node with a DNS record that has an expired TTL.
  Sticky sessions (session affinity) are enabled for the load balancer. Sticky sessions use cookies to help the client maintain a connection to the same instance over a cookie’s lifetime, which can cause imbalances over time.
  Available healthy instances aren’t evenly distributed across Availability Zones.
  Instances of a specific capacity type aren’t equally distributed across Availability Zones.
  There are long-lived TCP connections between clients and instances.
  The connection uses a WebSocket.


Generally speaking, if the client or any configuration can cause sticky session, it still have possibility can get the traffic imbalanced. The detail can refer to the following article on AWS knowledge center:


  Why is Elastic Load Balancing unequally routing my load balancer traffic?


But overall, using the IP mode to register our Pods, literally can resolve the problem as we described due to the design of Kubernetes service networking.

A summary if you would like to optimize the traffic imbalanced when using AWS Load Balancer Controller

Using IP mode as register target to prevent Kubernetes additional hop

Although Elastic Load Balancer can offer an option to register your targets by instances, however, it generally would be suitable when you are running single service and expose it with a port on a dedicated EC2 instance. With Kubernetes service running on your EC2 instance but exposed as NodePort service, it can involve multiple Pods behind the service port offered on your instance due to the service load balancing. The packet can be replaced to other destination field of your Pod’s private IP address when the packet flood into the instance through Linux ipvs or iptables rules.

If the service work load is relying on Kubernetes deployment, it is recommended  such as service.beta.kubernetes.io/aws-load-balancer-nlb-target-type for NLB, alb.ingress.kubernetes.io/target-type for ALB.

Prevent to enable sticky session on ELB

It is also important to make sure the Elastic Load Balancer won’t stick your client session to specific target⁴ ⁵. Although Elastic Load Balancer provides cookie-based stickiness session to bind a user’s session to a specific target, which can be achieved by configuring the load balancer attribute and also supported by AWS Load Balancer Controller as below, but to optimize the traffic imbalanced, it is recommended to avoid use the sticky session as it can potentially cause the phenomena.

# ALB
alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.lb_cookie.duration_seconds=60

# NLB
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: stickiness.enabled=true,stickiness.type=source_ip


Equally distribute Pods across Availability Zones

As ELB requires to strike the balance between your Availability Zones to ensure the service high availability. This helps your traffic can correctly be separated on all backend target.

Summary

In this article, it explains the practice of optimizing the load balancing and mitigate the imbalanced traffic problem when deploying service with Kubernetes. This article also brings you an overview and learn what other scenarios that you can potentially find out ELB might unequally route traffic to your backend targets.

In the next article we will review a couple of Kubernetes load balancer controllers that can be deployed on Amazon EKS and see what option can be the best practice for your environment.


  Best practice for load balancing - 1. Let’s start with an example from Kubernetes document
  Best practice for load balancing - 2. imbalanced problem
  Best practice for load balancing - 3. what controller should I use


References


  
    
      AWS VPC CNI Plugin ↩
    
    
      AWS Load Balancer Controller ↩
    
    
      How Elastic Load Balancing works - Routing algorithm ↩
    
    
      Configure sticky sessions for your Classic Load Balancer ↩
    
    
      Sticky sessions for your Application Load Balancer ↩
    
  



[AWS][EKS] Best practice for load balancing - 1. Let’s start with an example from Kubernetes document
2022-04-08T00:00:00-05:00
This article is sharing the best practice for doing load balancing on Amazon EKS, learn what is advantage and disadvantage of using different controller. We will discuss more detail about what is the problem of using default Kubernetes service deployment as mentioned on official document.

Understand the default Kubernetes load balancing

An overview of externalTrafficPolicy

I have many occurrences to see Kubernetes administrators are not very familiar with the Kubernetes network flow, and feel struggling about that when they need to diagnose networking issue, especially for users using managed Kubernetes cluster service. But I think that’s normal to see this gap because it reflects Kubernetes is doing encapsulation perfectly, causes you are unable to easily troubleshoot any real-world failures unless you had deeply understand its design.

Before walking through the detail about the load balancing, it is required to understand the fundamental knowledge of Kubernetes load balancing and its effect when defining your YAML files.

In the Kubernetes, it provides External traffic policy, so you can set this field (spec.externalTrafficPolicy) in your Kubernetes service deployment to control the flow, and decide how to route the traffic from external. Kubernetes offers two options for this policy: Cluster and Local, let’s have a deep overview to see how it works:


  
  
    
      Figure 1: externalTrafficPolicy

    
  


By default, the kube-proxy is performing this layer of load balancing by using iptables. Based on the Pods you are running, it will create rules in  your iptables and uses random mode (--mode random) to perform the load balancing based on the probability. For example, if you have 3 Pods need to be distributed, kube-proxy will take the responsibility to add required iptables rules with defined probability, and try to balance the load:


  
  
    
      Figure 2: Kubernetes service overview (source)

    
  


I am not going to drill down into too much detail as it can increase the complexity of this article, however, if you are interested to learn how this translation happens, you can review the iptables rules on your host to see what’s going on.

# An example of iptables rules
-A KUBE-SVC-XXXXX -m comment --comment "default/app" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-AAAAAA
-A KUBE-SVC-XXXXX -m comment --comment "default/app" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-BBBBBB
-A KUBE-SVC-XXXXX -m comment --comment "default/app" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-CCCCCC
-A KUBE-SVC-XXXXX -m comment --comment "default/app" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-DDDDDD
-A KUBE-SVC-XXXXX -m comment --comment "default/app" -j KUBE-SEP-EEEEEE


As mentioned in Figure 1, when externalTrafficPolicy=Cluster, it can have a scenario will route the traffic to other Nodes if you deploy Pod(s) on them. By relying on the iptables rules, this policy can accomplish the load balancing by redirecting them to other Nodes. In theory, this can bring the traffic jump out of the original Node.

When externalTrafficPolicy=Local, it limits the traffic only can be redirected on the same Node; however, the behavior of doing load balancing through the iptables still happens. If you have multiple Pods running on the single Node, the traffic can be routed to one of them.

Deep dive into the load balancing behavior - An example from K8s document

Let’s see an example mentioned at official Kubernetes document¹:

apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: nginx


If you use AWS as cloud provider and deploy the service, it generally will create an Elastic Load Balancer (Classic Load Balancer) and provide the traffic load balancing. The Elastic Load Balancer will be managed by the in-tree load balancer controller², which is implemented in Kubernetes source code; hence, you can simply provision the Elastic Load Balancer on AWS seamlessly.

Looks familiar, right? The example above is quite common if you find tutorial on somewhere. Maybe that is exactly same configuration running in your production environment.

But here is the problem: by default, Kubernetes implements another layer of load balancing, which is backed with kube-proxy. Let’s say if you have two worker nodes (Node-1 and Node-2), and each node have Pods running on it (Pod-1, Pod-2 on Node-1; Pod-3 on Node-2), using default option (externalTrafficPolicy=Cluster). On AWS, the traffic flow generally is representing as below:

Case 1

The default Kubernetes service will expose your application with a specific service port to provide external accessibility (NodePort), and establish relevant iptables rules to perform NAT traslation by replacing the IP address of the destination field.

With this design, this can be a happy case if kube-proxy doesn’t redirect the request to other host, which can be outlined as:

client -> Load Balancer -> Node-1 (NodePort) -> iptables rules -> Pod-1 on Node-1


Case 2

However, what if the iptables forward the traffic to other Nodes?

client -> Load Balancer -> Node-1 (NodePort) -> iptables rules -> Pod-3 on Node-2 


On the other hand, if you deploy a Kubernetes service like this, the traffic flow can be routed as two particular phenomena:


  
  
    
      Figure 3: The traffic flow when working with externalTrafficPolicy

    
  


As you can see, no matter what it is, the behavior seems like doesn’t provide a better route because it definitely increases the number of hops for the traffic flow.

What about externalTrafficPolicy: Local? Does it work better?

Follow the example as mentioned in the previous paragraph, let’s say if you have two Pods (Pod-1 and Pod-2) running on the same Node (Node-1). The traffic flow of this policy generally can be breaking down as below:

client -> Load Balancer -> Node-1 (NodePort) -> iptables rules -> Node-1 (Target Pod-1)
client -> Load Balancer -> Node-1 (NodePort) -> iptables rules -> Node-1 (Target Pod-2)


When load balancer move the request to the backend (Node-1), the probability to forward the request by iptables rules to the Pod-1 and Pod-2, is 50% chances.

On the other hand, the traffic firstly pass through the Elastic Load Balancer, and do the routing again in the system level (iptables), which means the architecture will perform the load balancing twice.

With no doubt, it did not offer the best path for the traffic routing.

The reason why you will see your targets are failing the health check even Pods are running

If externalTrafficPolicy=Local and you have multiple Nodes running behind your Elastic Load Balancer, you probably will see some Nodes will fail the health check, which can be expected.

That’s because if some Node doesn’t run the service’s backend Pods so it cannot pass the health check.


  
  
    
      Figure 4: Some Node doesn’t pass health check due to externalTrafficPolicy

    
  


In general, it doesn’t impact anything because the ELB will ensure only healthy targets can be routed; however, in this case, it doesn’t perfectly distribute the load with Elastic Load Balancer and offer high availabilty when we have multiple Pods. If the Node down, it can impact all Pods running on it.

What’s wrong with externalTrafficPolicy?


  So, looks like using externalTrafficPolicy=Cluster is a good option?


Imagine you have a long running connection is jumping out of the first Node, unfortunately, the first Node is having issue such as hardware failure, intermittent connectivity problem … etc. In the end, it is going to be down. In this case, if any existing connections forwarded from other Nodes, the connections will be impacted and cannot response back to the origin correctly. In general, the Node down can cause the packet loss because the connection route is established in the middle:

(If you have established connection passed the Node, here is an example of the breaking route situation if the Node-1 in the middle is down.)

  client -> Load Balancer -> Node-1 (NodePort) -> iptables rules -> Target Pod-2 on Node-2


If you reviewed the flow of Figure 3, connections can be routed to different paths and it can be hard to predict once you deployed many Pods. It also increase the complexity if you would like to trace the networking flow during the problem diagnostic.

When having a large scale scenario (e.g. deploy 100, 500 even 10,000 Pods), this also can potentially bring the system level issue, or, result in the packet loss, such as network latency increased due to kernel needs to compare several iptables rules when a new connection comes in; or, reach out to the kernel limits for the networking stack, because Linux kernel needs to track of them when working with iptables, and insert the rules on the system level. One common issue is to fill out the connection tracking table (conntrack) of the Linux kernel when the scale grows.

Summary

In this article, it explains the behavior of load balancing on Kubernetes. This article also brings you an overview and learn what issue can occur if you follow the default Kubernetes example to deploy your Elastic Load Balancer.

Now we have deep-level understanding of the Kubernetes load balancing, let’s start with more discussion regarding the load imbalancing problem with the current architecture on Amazon EKS in the next article.


  Best practice for load balancing - 1. Let’s start with an example from Kubernetes document
  Best practice for load balancing - 2. imbalanced problem
  Best practice for load balancing - 3. what controller should I use


References


  
    
      Kubernetes service - Type LoadBalancer ↩
    
    
      Kubernetes source code - aws_loadbalancer.go ↩
    
  



NEX WORK - 獲得國際大公司的內推機會、串連世界各地菁英：如果流浪是為了找回家的路，我們有責任把回家的路變得更美好
2022-03-12T00:00:00-06:00
我們知道獲得國際大公司的內推機會對於台灣人十分困難，為了串連海外台灣人才菁英並且開啟另類海外求職內推的機會，我將在這篇內容中與你分享你可能會感興趣的資源：NEX WORK — 一個非營利線上求職內推平台。


  
  
    
      (source)

    
  


NEX WORK 由一群 NEX Foundation (台灣未來基金會) 台灣熱血的工程師建置，目的在於打破對於海外求職的高門檻和增加被看見的機會，以串連在世界各地的海外菁英，建立永續的機制，並促進正向的人才循環。

NEX Foundation 台灣未來基金會是什麼？



NEX Foundation 台灣未來基金會成立於 2018 年，為美國聯邦政府核准的 501(c)(3) 非營利慈善新創機構。NEX 以美國西雅圖及台灣台北為據點，希望透過研發和經營線上的資源平台，協助海外人才在國際舞台上的職涯發展。並更進一步扮演橋樑的角色，連結活躍於世界各地的台灣人才，共同推動企業媒合、職涯諮詢、媒體實驗、社群聚會等計畫，期盼建立具有延續性的全球台灣人才互助圈。


  NEX 的成立背景，來自團隊成員們的海外故事。一腳踏出熟悉的家鄉來到文化迥然的異地，學習在高度的競爭環境中生存，在面臨新的挑戰時保持堅強。然而，對於缺乏資源和當地人脈的海外遊子來說，眼前往往有許多的不容易和不安全感。


為此，我們希望透過 NEX 的運作，以互助的力量來提拔下一位追夢者，為需要幫助的人才們，創造更多的機會和協助更多夢想的實現。

基金會最初是由陳浩維 (HW. Chen) 和一群在美國工作的熱血朋友們，開始投入「NEX Foundation 台灣未來基金會」的籌備工作。(認識在世界各地的努力貢獻的團隊專家及志工)

並且，基金會於 2018 年 12 月獲得美國國稅局（IRS）批准，正式成為美國聯邦層級的非營利教育慈善機構。


  
  
    
      NEX Foundation - Why NEX (source)

    
  


NEX 的首要任務為建立一個信任平台來連結全球的台灣人才，系統性整合現有資源，推動相關協助計畫（如：公司內推整合、人才輔 導計畫、獎學金計畫等），期待去翻轉台灣「人才外移」的負面印象，並希望作為一個正向動力去團結大家的力量和資源，去幫助更多夢想和成就的實現。

目前 NEX Work 仍在持續投入許多非營利專案、包含定期舉辦社群活動和分享，為更多台灣人開啟海外求職的契機。你也可以關注在以下連結獲得更多資訊：


  Website: https://nexf.org/
  LinkedIn: https://www.linkedin.com/company/nexfoundation/
  Facebook: https://www.facebook.com/NEXFoundation/
  NEX Community: https://www.facebook.com/groups/nexfoundation
    
      (Facebook 社團 - 不定期發布社群講座和活動資訊)
    
  
  NEX Media Lab: https://media.nexf.org/


什麼是 NEX WORK？為什麼會有 NEX WORK？



NEX WORK 是一個非營利線上求職內推平台，目前仍在 Beta 階段，目的在於連結世界各地菁英，創造團結互助的力量。

尤其在日趨競爭的就業環境、對外國人不友善的簽證流程、其它族裔互助合作壯大自我（甚至遊走法律邊緣）等等現實情況下，我們認為幫助自己人為理所當然，且勢在必行。¹

因此，作為一個推進的力量，支持和協助台灣人才的職涯發展，團隊成員在工作之餘努力的推動 NEX WORK 專案，且不斷的仍在持續收集用戶回饋和優化。

透過 NEX 設計的內推系統，讓台灣人陪著台灣人在國際職涯路上打開第一扇窗或衝刺最後一哩路。NEX WORK 快速整合求職供需鏈，讓你在同公司內找到那一把手，爭取時間就是爭取機會。

誰在使用 NEX WORK?

NEX WORK 作為初始試驗平台，目前已經累積許多在各個世界知名公司的台灣人自發性的在上面提供內推的管道 (包含我自己)。

目前平台上除了有知名的科技公司的台灣海外菁英自發性提供內推渠道 (例如：Facebook(Meta), Amazon, Apple, Google, Dropbox, Cisco)，也包含其他知名會計、加密貨幣交易所等。如果你有興趣提供相關的機會，也可以參考以下資訊透過註冊系統開啟這項渠道。

如何使用 NEX WORK

第一步可以透過以下連結訪問 NEX WORK 平台：


  NEX WORK: https://work.nexf.org/


界面十分直覺 (如果覺得很難用請不吝透過右側的 Feedback 表單與我們分享使用者反饋)，下捲即可以看到目前有哪些公司，以及該公司存在的推薦人數。


  
  
    
      NEX WORK 的介面

    
  


為了快速幫助你了解平台的相關特性，以下分為兩種使用情境分別提供更多細節：

A. 若你要尋求內推機會

你可以透過右上角的註冊按鈕 (或是點擊這裡註冊) 填寫基本資料註冊帳戶並且完成信箱驗證。

完成註冊後，你可以選擇要尋求內推的公司，以下以 Amazon 為例：


  
  
    
      選擇尋求內推機會的公司

    
  


點擊後你可以查看有關提供內推渠道的推薦人以及查看更多資訊。

為了避免尋找不是活躍的推薦人讓推薦請求石沉大海，你也可以透過系統的幾項關鍵指標，例如：


  推薦了幾名申請人
  最近內推時間



  
  
    
      選擇內推的推薦人

    
  


點擊「幫我內推」即可以填寫必要基本資料及上傳履歷資訊：


  
  
    
      填寫必要基本資料和上傳履歷、職位細節

    
  


完成申請後，點擊 「檢視我的內推紀錄」即可查看你的內推資訊：


  
  
    
      檢視自己的內推紀錄

    
  


請注意：NEX WORK 並不保證推薦人一定會幫助你內推，內推的推薦人仍然會針對你的經歷及你提供的各種資料來決定是否內推，並非來者不拒的盲目內推。

給申請者的建議 - 履歷撰寫

請提供詳實的自我經驗總結介紹及必要資訊，幫忙的朋友看了你提供的資料後將會依情況決定是否要花時間內推。

若履歷上經驗不足、或是你沒有提供備註註明所需的資訊，為了維持內推的品質而被 HR 列為黑名單，仍然有機會拒絕你的請求。

若你對於履歷格式不確定如何開始，可以參考以下撰寫範例：


  This Is What A GOOD Resume Should Look Like


請注意上述履歷通常比較適用美國企業 (例如：不需要特別放 profile photo、個人簡介)，不過可能會隨著國家有所區別，網路上有許多資源，請依照自行狀況斟酌。

給申請者的建議 - 內推就一定會有面試機會嗎？

通常內推是無法保證一定有職缺面試、錄取的機會 (我自己協助內推的經驗仍然是有被錄取的候選人寥寥無幾)。

但各公司的內推機制通常提供了比直接從網路上海投更容易被招聘團隊看見的機會，甚至可能縮短你在前期等待回應的時間。

內推通常為推薦人主動提供這樣的機會，並且很常會需要花費額外的時間。甚至必須要花費一陣努力 (私下了解你的背景、跟 HR 追問進度)。


在付出這麼多額外的時間後，仍在招聘公司許多考量下，沒有被錄取也十分常見，因此，請記得保持禮貌及感謝每一位幫助的海外台灣人。


B. 註冊成為推薦人 (我是某公司員工，我願意為串連海外工作機會盡一份心力)



你可以透過右上角的註冊按鈕，並且點擊「註冊成為推薦人」(或是點擊這裡註冊成為推薦人) 填寫基本資料註冊帳戶：


  
  
    
      註冊成為推薦人

    
  


並且可以於「公司名稱」一欄選擇或是新增你目前能夠協助推薦的公司完成註冊：


  
  
    
      選擇或是新增協助推薦的公司

    
  


你可以進一步編輯相關的個人資料以利尋求內推機會的人更加了解你。如此一來就完成必要資料的填寫，尋求推薦的候選人便可以到首頁檢視並且透過您所提供的管道提交必要材料。

一旦有新的推薦請求，你可以在「檢視內推申請」查看待處理的推薦請求：


  
  
    
      檢視待處理的內推申請

    
  


常見問題

註：若你是推薦人，目前 NEX WORK 工程團隊已經收到「內推紀錄」和「內推申請」的相關使用者反饋，一個是尋求內推機會的紀錄、一個是協助內推的紀錄，請記得不要搞混哦


  
  
    
      檢視內推紀錄

    
  


總結

希望這篇內容能夠對你有所幫助並且更加了解 NEX WORK 平台。若有任何關於 NEX WORK 的任何建議，也歡迎透過右側的 Feedback 表單或是以下聯繫方式，讓我們一起把 NEX WORK 變得更好：


  Facebook 專頁私訊 - NEX Foundation 台灣未來基金會
  Email: contact@nexf.org


此外，在 NEX Foundation 我們相信「今日的路人是明日的引路人」，延續 Give and Take 的精神，如果你願意一同攜手支持或是加入全球志工團隊的一份子讓我們啟動正向迴圈²，幫助更多台灣人走向世界，讓回家的路變得更好。你可以透過以下連結了解更多資訊：


  NEX Foundation - Get involved
  NEX Foundation - SUPPORT TO MAKE THE CHANGE


我同時也在 NEX Foundation 為串連台灣人才於國際舞台上的職涯發展貢獻己力，你可以透過 NEX WORK 附上 CV 提交內推申請 (需註冊登入) 以引薦更多像你這樣的優秀人才，或是透過我的 LinkedIn 與我聯繫。


  NEX WORK：申請內推 (需登入)


如果你覺得這樣的內容有幫助，可以在底下按個 Like / 留言讓我知道。


  
    
      ptt, Amazon各職位內推 讓台灣人把亞麻填滿滿 ↩
    
    
      ptt, NEX Foundation台灣未來基金會號召全球熱血鄉民 ↩
    
  



AWS Firecracker 論文導讀：一個小孩才做選擇，我兩個全部都要的 VMM
2022-02-20T00:00:00-06:00
AWS 公開 Firecracker 專案已經至少 1-2 年的時間，也許你多少都聽過這項技術。然而，究竟 Firecracker 是什麼，我想可能你也仍然一知半解。有鑒於我認為學術類型的內容有時不容易讓人明白，因此，我希望可以透過以下的篇幅，分享我自己對於閱讀 Firecracker 設計論文的一些理解。

由於我個人沒有受過正式的學術訓練，因此，如果有專家願意提供任何見解，也請不吝給予指正及建議。


  
  
    
      Firecracker 概覽 (source)

    
  


概覽

在閱讀完 Firecracker 的相關論文後，我對於這個專案的結論，簡直可以說是一個小孩才做選擇，我兩個全部都要的設計。

在論文中提到，設計 AWS Firecracker 由於在權衡 Hypervisor-based 和 Linux container 虛擬化技術之間產生的相容性、安全性的優缺點，兩者取其一似乎都無法滿足 AWS 基礎建設所需滿足的工程目標。因此，Firecracker 決定打破這樣的抉擇，擔任起 VMM (Virtual Machine Monitor) 的角色，並且引入相關的各種現有功能機制的優點，以滿足運算虛擬化的設計需要。


  Implementors of serverless and container services can choose between hypervisor-based virtualization (and the potentially unacceptable overhead related to it), and Linux containers (and the related compatibility vs. security tradeoffs). We built Firecracker because we didn’t want to choose.


目前 AWS 已經將 Firecracker 導入至兩個公開的無伺服器 (Serverless) 服務：AWS Lambda 及 AWS Fargate，並且支援數百萬的用戶和單月萬億級別 (trillions) 的請求，以下將具體描述更多 Firecracker 相關的細節。

(相關的 Paper 原文和我自己畫的重點請參考¹)

基本名詞釋義 (Terminology)

由於 Firecracker 屬於一種作業系統虛擬化技術的延伸，其中涉及 Linux 作業系統虛擬化的諸多細節。因此，首先在閱讀這篇內容之前，必須先了解基本的一些概念和名詞釋義：

Hypervisor

Hypervisor 可以視為用於管理虛擬機器 (Virtual Machine) 的軟體、系統或是韌體。使用虛擬化技術允許我們在單一個電腦上運行多個不同的系統、甚至是可能不同的作業系統 Kernel，並且將其放置於一個虛擬的運行環境中 (Virtual Machine)。因此，Hypervisor 的目的就是用來管理這些虛擬機器，通常，用來執行一個或多個虛擬機器的電腦稱為宿主機 (Host)，這些虛擬機器則稱為客戶機 (Guest)。

Gerald J. Popek 及 Robert P. Goldberg 在 1974 提出了兩種類型的 Hypervisor ²，分別為 Type 1 和 Type 2：


  
  
    
      Hypervisor 類型 (source)

    
  


因為在虛擬機器中，安裝了一個 Guest OS 並不意味著就能直接使用 Host OS 的所有資源 (例如：磁碟寫入、CPU 時間、I/O 等操作)。通常，Hypervisor 會實作「模擬」這些裝置讓 Guest OS 以為能夠使用，但實際上仍交由虛擬化技術實際將這些操作轉譯、排程交給 Host OS 處理。


  Type 1 的 Hypervisor 通常需要硬體和 Kernel 支援，因為通常能充分交付使用硬體操作，而無需透過 Hypervisor 為 Guest OS 執行作業系統各種操作的 (syscall) 轉譯。這也通常意味著，執行的效能也比較高。常見的實作如：Xen 和 Linux KVM。
  Type 2 軟體會運行於主要的 (Host OS) 並且可能會以一般的軟體應用形式運行，通常執行效率比 Type 1 來得低。常見作業系統層級的軟體類似於 VMWare、Vritual Box Hypervisor 軟體。


VMM (Virtual Machine Monitor)

基本上與 Hypervisor 相同，如同他的名字一樣 (Virtual Machine Monitor)，VMM 設計的目的就是用於建立、監控、管理和捕捉跑在虛擬機器中的 I/O 操作 (磁碟寫入、網路吞吐等)。

QEMU

QEMU 是一個開源的 VMM，由於 QEMU 的架構由純軟體實現，並且處於 Guest machine 與 Host machine 擔任中間者角色，以處理 Guest machine 的相關硬體請求，並由其轉譯給真正的硬體，使得其存在一些效能問題。

KVM

KVM (Kernel-based Virtual Machine) 是一種 Linux Kernel 支持的虛擬化技術，可以將 Linux Kernel 轉換成一個可用的 VMM 並將系統轉換為 Type 1 (bare-metal) 類型的 Hypervisor，使得你可以在 Linux 系統上運行多個隔離的虛擬環境 (VM)。KVM 一直是 Linux Kernel 設計的一部分，並且存在於主流的 Linux Kernel 版本中。因此，由於屬於 Linux Kernel 支持功能的一部分，通常可以使用接近原生系統的相應執行效能處理對應的 I/O 操作。

crosvm

crosvm 為 Google 的一項開源專案 (Chrome OS Virtual Machine Monitor)，用於 Chrome OS 執行虛擬化機制的操作，基於 Linux KVM Hypervisor 實現虛擬化技術，並且用於 Android、Chrome OS 為基礎的系統中。與 QEMU 相比，它並不直接模擬實際的硬體裝置，反之，它採用了 Linux 支持的半虛擬化的裝置標準 (virtio) 來模擬虛擬機中相關的裝置。Firecracker paper 中具體提到了實作中採用了以 crosvm 作為基礎核心背景修改。

Cgroup (Control Group)

cgroup 是 Linux Kernel 一項支持的功能，主要可以用來限制運行在容器執行環境中的資源使用 (例如：CPU、Memory 和磁碟讀取寫入等)。cgroup 同時也被大量運用在 Linux container 的技術中，例如：Kubernetes、Docker 等。

在 Firecracker 中，提及了基於不信任 Guest OS 對於資源控制的行為。這是由於 Guest OS 屬於客戶控制的一部分，並無法預期其是否能依照合理的使用行為運行，因此，Firecracker 也採用了 Linux 本身支持的功能及 cgroup 等機制，限制了 VMM 和各個虛擬機器總體可用的資源。

Seccomp

seccomp 是 Linux Kernel 支持的一項功能，用來限制在容器中運行的 process 可以呼叫的系統方法 (syscall)。可以想像就像是允許使用特定 Linux function 的白名單，在 process 的直接階段僅允許特定系統呼叫操作。

同樣的機制也被實踐在一些容器虛擬化技術中，例如 Docker 定義的預設 seccomp profile。

AWS Firecracker 是什麼

在過去，AWS 主流的 Serverless 服務提供了 AWS 客戶另一項託管運行應用的選擇 - 用戶不需要再自行管理底層運作機器和安全性修補的工作。

最著代表性的 AWS 服務便是 AWS Lambda，如果你不知道 AWS Lambda 是什麼，AWS Lambda 是一種無伺服器 (Severless) 運算服務，使用者可以直接上傳你的程式碼並且選擇對應的規格運行，你無須在煩惱需要使用什麼樣的硬體規格以及為維護工作煩惱。

同時，也提供在大規模的應用情境中隨著用量可以動態擴展的優勢。然而，在 AWS Lambda 服務剛釋出時，其採用了 Linux Container 用以隔離不同客戶的執行環境 (類似於 Docker 相應的技術)，然而，這樣的機制除了可能在客戶使用運行執行環境存在限制 (需使用依賴 Host OS 支持的 Kernel 版本指令集)，也因為同時共用相同的 Kernel，也可能存在部分安全性風險。


  When we first built AWS Lambda, we chose to use Linux containers to isolate functions, and virtualization to isolate between customer accounts.



  
  
    
      AWS Lambda 虛擬化架構 (左) 為之前的設計 (右) 為採用 KVM & VMM 技術的設計

    
  


因此，在這篇 Paper 中，主要提到了 firecracker 評估使用虛擬化技術設計時存在六項重點考量：


  Isolation: 需要具備安全的隔離環境使用戶在運行應用時，能避免資料洩漏、非法提權等安全性問題
  Overhead and Density: 為了盡可能使單一機器的硬體資源應用最大化，該技術需要能夠提供運行至少數千個 microVM 執行環境 (Lambda function)
  Performance: 執行效能要能貼近使用原生實體機器 (Bare-metal) 的執行效能，換句話說能降低因為硬體指令轉譯的執行時間
  Compatibility: 需要能夠讓用戶執行應用時運行 Linux 支持的函式庫和執行檔，以客戶無需進行程式碼修改或是重新編譯
  Fast Switching: 能夠盡可能快速的啟動執行環境 (microVM) 和清除執行環境
  Soft Allocation: 存在資源動態調整機制，能夠允許虛擬執行環境分配額外的 CPU、Memory 等資源。讓每個應用僅可使用消耗他所需要的資源，而不是有權使用的系統資源


在這樣的條件下，論文 2.1 中的細節便是在具體討論和評估數項現有的虛擬化技術，包含：


  Linux container: Linux Kernel 本身支持的容器化技術，使 Linux process 存在於獨立的執行環境 (namespaces)，並達到 process-level 的隔離，包含 user IDs (uids), process IDs (pids) 及 network interface，並且能夠利用 chroot 機制隔離執行的檔案系統。同時，利用 seccomp-bpf 更可以達到 process 執行系統呼叫的限制 (syscall)。在相關的研究中，一般啟動 Ubuntu Linux (15.04) 版本的安裝需要 224 syscalls 及 52 個獨立的 ioctl 操作。
  Language-Specific Isolation：例如 JVM 透過劃分 Heap size 和虛擬執行環境，於記憶體空間分配支持的虛擬執行環境
  硬體和 Kernel 支持的主流虛擬化技術：Intel VT-x、KVM、QEMU。在論文中提到常見的 KVM 和 QEMU 組合通常增加了執行虛擬化上的複雜度，由於 QEMU 專案本身包含了大於 140 萬 (1.4 million) 的程式碼，並且至少需要 270 個系統呼叫操作 (syscall)，若使用這項基礎再疊加使用 KVM，則會再另外增加了 120,000 行程式碼。


因此，在評估和主流虛擬化技術比較這樣的背景下，AWS Firecracker 借鑒了許多解決方案而在眾多項目中選擇一個適當的平和。同時，基於 AWS 內部許多團隊，維運基礎架構都採用 Linux 系統，促使 Firecracker 在設計的哲學上的這項決定。更重要的是，Firecracker 更之所以遵循沿用 Linux Kernel 本身就支持的技術，而不是重新實作替代它，正是因為這些功能行之有年，並且具備高質量、成熟的設計 (例如：scheduler、TUN/TAP network interface)，也能讓 AWS 原本的團隊使用熟悉的 Linux 工具和維運流程執行除錯。例如：採用 ps 即可列舉機器上運行的 microVM，其他 Linux 本身支持的工具 (top、vmstat 甚至是 kill) 均可以在預期的操作下管理 Firecracker。

基於這項原因，Firecracker 使用了 KVM 作為主要的虛擬化執行基礎，並且實作 VMM (Virtual Machine Monitor) 元件以滿足管理 KVM 執行環境的需要。


  Our other philosophy in implementing Firecracker was to rely on components built into Linux rather than re-implementing our own, where the Linux components offer the right features, performance, and design


KVM 在這樣的基礎下做了哪些事？

執行硬體層級的虛擬化 (HVM) 及資源分配，例如：CPU、處理記憶體管理、分頁 (Paging) 等。

AWS Firecracker 實作

在 Firecracker 的實作中，以 Google crosvm 作為基礎，移除了大量不必要的裝置，例如：USB、GPU 以及 9p 檔案系統協議 (Plan 9 Filesystem Protocol)。在這樣的基礎下，Firecracker 以 Rust 語言為主增加了額外約 2 萬行的程式碼；同時修改了約 3 萬行的程式碼並且開源公開。


  
  
    
      Firecracker 架構

    
  


Firecracker 同時模擬了有限的一些 I/O 裝置，例如：網路卡、磁碟、序列端口 (serial ports)、i8042 支持 (PS/2 鍵盤的控制器)；與 QEMU 相比，QEMU 相對複雜許多，其支持多餘 40 種不同的裝置，包含 USB、影像和音訊裝置。

更細部的設計架構如下：


  
  
    
      Firecracker 細部設計架構 (source)

    
  


在 Firecracker 中採用了 virtio 作為網路和磁碟裝置的模擬，其中大約佔 Firecracker 1, 400 行 Rust 程式碼。同時 Firecracker 也提供了 REST API 設計，使其能夠使用 HTTP 的用戶端直接與其互動 (例如：curl)。

總結來說，Firecracker 旨在提供以下機制 ³ ⁴:


  設定 KVM
  提供裝置模擬，包含模擬 SSD、NIC (網卡) 等，即使沒有那麼多裝置在實際的硬體 (AWS Lambda Host) 上，使用 virtio 使得其可以虛擬化這些裝置
  執行環境效能的隔離 (採用 cgroup)
  為 Serverless 提供優化的效能 (5-8 Mb 的 Linux 啟動時間可以縮短為 100ms 以內、更微小的 Kernel 更可以縮短為 5ms)


資源限制 (Rate Limiter)

在 Firecracker 中的硬體裝置涵括了限制配額的機制，包含可以限制 Disk IOPS (I/O Per Second)、PPS (Packets Per Second for network)。在 Firecracker 提供了使用 API 設定 microVM 可用的資源請求，包含 CPU、磁碟 I/O、網路吞吐等。

其資源限制機制採用 virtio 本身支持的資源限制功能，以網路裝置來說，可以是以下的配置機制 (rx_rate_limiter)：

PATCH /network-interfaces/iface_1 HTTP/1.1
Host: localhost
Content-Type: application/json
Accept: application/json

{
    "iface_id": "iface_1",
    "rx_rate_limiter": {
        "bandwidth": {
            "size": 1048576,
            "refill_time": 1000
        },
        "ops": {
            "size": 2000,
            "refill_time": 1000
        }
    }
}


安全性 (Security)

為了實踐安全性的最佳化，Firecracker 在部署階段需要充分避免一些因為 Linux kernel 或虛擬化技術可能帶來潛在的安全性問題，例如：Intel Meltdown、Spectre、Zombieload 等安全性漏洞。因此，在生產環境中，為了解決這項顧慮，Firecracker 實踐了幾項部署重點：


  關閉 Symmetric MultiThreading (SMT, aka HyperThreading)
  Kernel Page-Table Isolation, Indirect Branch Prediction Barriers, Indirect Branch Restricted Speculation and cache flush mitigations against L1 Terminal Fault
  啟動部分 Kernel 參數包含 Speculative Store Bypass mitigations
  關閉 swap 和 samepage merging
  避免共享檔案 (解決 timing attacks like Flush+Reload and Prime+Probe)
  以及使用建議的硬體設備以解決 RowHammer 攻擊技術


相關的 Firecracker 生產環境部署建議同時列舉於以下文件中：


  Production Host Setup Recommendations


同時為了避免 Firecracker VMM 執行操作的過程出現任何非預期行為 (例如：安全性漏洞允許植入惡意代碼)，在 Firecracker 中實現了使用另一層沙箱 (Sandbox) 提供額外隔離的保護。在 Firecracker 的設計稱之為 Jailer。

雖然是這樣說，但在 Paper 中提到的具體實作，仍為使用 Linux container 提供的技術執行，包含：


  以 chroot 機制隔離執行的檔案系統
  以 namepsace 隔離執行環境、pid、network namespaces
  移除 System privilege (基於 Linux Capabilities 功能 ⁵)
  以 seccomp-bpf profile 設定允許呼叫的 syscall



  
  
    
      Firecracker Jailer

    
  


在 jailer sandbox 配置的 chroot 目錄中，裡面僅包含 Firecracker 編譯的執行檔、/dev/net/tun、cgroup 控制檔案和 microVM 所需的資源。並且，預設情況下 seccomp-bpf profile 設定了 24 個系統呼叫操作 (syscalls) 和 30 個 ioctls 操作為白名單。

不過就我的研究，如果我理解正確，似乎 Firecracker 在 seccomp filter 上面在最近的版本多了不少 syscalls 支援:


  Firecracker default seccomp filters (x86_64)
  Firecracker seccomp integration test case


Firecracker 與 AWS Lambda 架構之間的關係 (High-level architecture)

在 Firecracker 設計出來後，AWS 便逐漸於 AWS Lambda 的底層架構中導入使用。使用 AWS Firecracker 允許 Lambda 的架構在每個執行的節點 (Lambda worker) 運行數千個 microVM。

Lambda High-level to low-level architecture

AWS Lambda 從上層到下層的架構可以由遠至近如下：

(1) 用戶透過事件經由 Frondend service 觸發 Lambda function (可以是 API Gateway, 其他來源等)，會由 Worker Manager 定義配置部署可用的執行機器 (Lambda Worker)


  
  
    
      AWS Lambda 觸發的架構流程

    
  


(2) 一旦觸發後，Frondend service 交付由 Worker Manager 會遵循調度演算法 (sticky routing) 盡可能將觸發對象黏著在特定的 Lambda Worker 機器上，並且建議觸發的對象 (invoke service) 直接將請求的內容 (payload) 直接轉送到目標的 Lambda worker 機器上，減少觸發上的延遲和來回交互請求 (round-trip)。


  
  
    
      AWS Lambda 事件觸發的流程

    
  


(3) 在每個 Lambda worker 提供了 slot 這個抽象物件，該抽象物件即客戶預先載入的 Lambda function 應用程式碼 (Lambda function code)，並且在後面每次觸發的行為上盡可能的重複使用這個執行環境 (slot)

重點在於 Firecracker 於 Lambda Worker 中部署的機制，每個 Lambda Worker 可以視為一個 Bare-metal 的機器，上面運行著 Firecracker VMM 用於管理多個 MicroVM (Lambda function, slot)；每個 microVM 包含了客戶的執行環境 (sandbox) 和客戶的應用程式碼，以及一個 shim control process 用於採用 TCP/IP socket 和 Micro Manager 互動的元件。

(MicroManager 可以視為 Lambda data plane 和 control plane 互動的元件)


  MicroManager provides slot management and locking APIs to placement, and an event invoke API to the Frontend



  
  
    
      Lambda Worker 中的細部架構

    
  


同時 MicroManager 部分也確保存在小量預先啟動的 MicroVMs，以確保有放置請求的即時需要。這是因為即使 Firecracker 能縮短在 125ms 內啟動，這樣的啟動時間可能仍不足以滿足 AWS Lambda 客戶快速啟動擴展的需要，並且可能會部分阻塞用戶的執行請求，因此在實務中，存在類似這樣 pre-warm 的機制。

Firecracker I/O Path

當 AWS Lambda 中的應用執行寫入操作時 (假設 Guest OS 中的應用希望寫入檔案到磁碟)，此時會交付由 virtio driver 處理該操作，並且由 virtio driver 將其放置到共享記憶體 (shared memory) 中，並且於系統 Ring Buffer 進行緩衝。然後，Firecracker 將被喚醒執行 I/O 操作，並且將該寫入操作真實的寫進實體磁碟當中。⁶

實際遷移到 AWS Firecracker 的執行

論文中提到 AWS 從 2018 年開始將 AWS Lambda 的客戶從 EC2 運行容器 (per function per container) 的基礎平台轉移到 Firecracker。在遷移過程中，並無可用性中斷、延遲或其他指標層面問題。

不過，在 AWS 內部團隊在遷移過程中，一些小問題也因為這樣的遷移暴露出來，例如前面提到為了安全性考量關閉了 Symmetric MultiThreading (SMT) 機制 (過去的部署中是開啟的)，使得使用 Apache HttpClient 應用執行的行為因為一些執行緒 (Thread) 相關的 bug 也因此暴露，並且存在於過去的 AWS SDK 版本中，需透過修補依賴函式庫解決這項問題。

但在 AWS 內部團隊完成遷移後，便開始實際將外部客戶的相關基礎建設逐步遷移至 AWS Firecracker 為基礎的設施，並且獲得巨大的成功。

同時，有鑒於涉及未來安全性補丁的修復和系統更新，由於傳統使用 rpm、yum 等套件管理工具進行管理的變因太多，可能導致軟體一致性問題產生，AWS 團隊採用了 immutable infrastructure 的策略來完成這項工作，即透過使用新版本的 AMI (Amazon Machine Image, 用於 EC2 的啟動鏡像) 直接啟動新的 EC2 instances，並且替換舊的 EC2 instances 來完成這項工作。

Evaluation (性能評估)

在該篇論文中，Firecracker 提供了數項不同測試數據的表現，同時，在 NSDI 2020 會議上也公開了對應的測試數據⁷。

下列的測試採用 EC2 m5d.metal instance type，其擁有 Intel Xeon Platinum 8175M 處理器 (48 cores, hyper-threading disabled)、384GB RAM 和 4 個 840GB 的 NVMe 磁碟。

在這項測試中 Host OS 使用 Ubuntu 18.04.2 以及 Linux kernel 4.15.0-1044-aws 版本。

這項測試與幾個主要的虛擬化技術執行比較，包含：Firecracker v0.20.0、Pre-configured Firecracker、Intel Cloud Hypervisor、QEMU v4.2.0


  
  
    
      Firecracker 啟動時間表現

    
  


在啟動時間的定義中，啟動時間為 VMM process 執行建立 process 操作 (fork) 並且 Guest Kernel 發起第一個 init process 的時間。

從數據顯示預先配置好 IO Port 的 Firecracker 和 Intel Cloud Hypervisor，兩者環境啟動時間皆優於 QEMU。然而，要注意的是，上述的測試結果中不包含設置網路裝置，一旦加入網路裝置的設置，Firecracker 和 Cloud Hypervisor 皆會在啟動時間中增加約 20ms，然而，QEMU 則是 35ms。


  
  
    
      Firecracker 記憶體消耗使用表現

    
  


在記憶體的消耗表現上 (Figure 7)，可以觀察到 QEMU 本身需要 128MB 的記憶體、Cloud Hypervisor 則約為 13 MB，然而，Firecracker 僅需約為 3MB 的記憶體消耗。


  
  
    
      Firecracker 磁碟 I/O 操作效能評估 (使用 fio)

    
  


值得一提的是，在檔案 I/O 操作的表現上 (Figure 8 & Figure 9)，該研究使用 fio 執行測試，明顯可以關注到在硬體資源能夠負荷超過 340,000 read IOPS (1GB/s at 4kB) 的情況下，Firecracker 以及 Cloud Hypervisor  僅可以被限縮使用約 13,000 IOPS (52MB/s at 4kB) 的吞吐效能。


  
  
    
      Firecracker 網路吞吐效能評估 (使用 iperf3)

    
  


該研究同時也採用了 iperf3 執行網路效能的測試 (針對虛擬的 tap 網路介面, MTU 為 1500 byte)。在機器能夠達到單一網路流 44Gb/s 及 46Gb/s (10 個並行傳輸) 的狀況下，Firecracker 僅可達到約 15Gb/s 的吞吐。然而，QEMU 獲得接近於 Cloud Hypervisor 的測試結果，均能擁有較好的網路吞吐性能，這裡部分歸納結論是由於 virtio 的裝置設計實作而產生的限制。

研究改進和結論

如同前面性能評估所提及的，基於 virtio 的實作因素，這使得 Firecracker 並無法取得以直接存取 PCI 裝置以接近實體機器的 I/O 吞吐性能。使得網路和磁碟 I/O 的效能存在部分限制。

然而，就論該研究的總結，與前面提及的六項主要問題呼應，AWS Firecracker 的技術著實達到其工程上的設計目標，包含：


  Isolation: 以 Rust 為基礎的 VMM 允許多個用戶 (multi-tenant) 於單一機器上運行並達到執行環境隔離
  Overhead and Density: Firecracker 允許單一硬體資源運行數千個 MicroVMs，以達到節省硬體資源的目的，同時帶來更低的 CPU 和 Memory 資源消耗
  Performance: Block IO 以及網路吞吐效能著實存在改善空間，但著實已經滿足 AWS Lambda 及 AWS Fargate 兩者產品所需
  Compatibility: Firecracker MicroVMs 運行為修改的 Linux kernel，允許客戶運行相關的應用程式碼，目前尚未發現不允許於 MicroVM 中運行的應用程式
  Fast Switching: Firecracker MicroVMs 有較短的啟動時間 (150ms)，並且在多個 MicroVMs 並行啟動的狀況下保持一致的效能
  Soft Allocation: Firecracker 測試允許超出 20 倍的資源配置比例，在 AWS 生環境中為允許超出實際 CPU 和 Memory 10 倍的配置比例並未存在問題


總結

Firecracker 除了於部分開源專案為虛擬化提供解決方案外 (例如：Kata container)，目前 AWS Firecracker 更是已經導入使用於 AWS 的基礎產品建設中，包含 AWS Fargate 和 AWS Lambda。

這樣的基礎設施改進同時也為客戶帶來更大的優勢，借助 Firecracker 的設計，使得 AWS Fargate 將運算定價折扣甚至達到 50% 的成本優化 (AWS Fargate Price Reduction – Up to 50%)。

基於對於這樣的技術感興趣，我著實閱讀了有關 Firecracker 整篇論文和參考部分 Firecracker 專案，歸納出上述的內容，並且花了一些時間整理這項導讀，更多訊息可以參考：


  Firecracker: Lightweight virtualization for serverless applications
  Firecracker project page
  Firecracker source code


希望透過這樣的導讀，能夠有助於你更加了解 AWS Firecracker 這項技術。

如果你覺得這樣的內容有幫助，可以在底下按個 Like / 留言讓我知道。

延伸資源


  AWS re:Invent 2018: REPEAT 1 A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1)
  AWS re:Invent 2019: REPEAT 1 A serverless journey: AWS Lambda under the hood (SVS405-R1)
  Firecracker: A Secure and Fast microVM for Serverless Computing (Orelly)
  Firecracker: Lightweight Virtualization - Opportunities and Challenges (2020)
  Deep Dive into firecracker-containerd (2019)
  Extending containerd - Samuel Karp & Maksym Pavlenko, Amazon (2019)
  Deep Dive into firecracker-containerd - Mitch Beaumont (AWS)


References


  
    
      Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, Diana-Maria Popa. (2020). Firecracker: Lightweight virtualization for serverless applications ↩
    
    
      Gerald J. Popek, Robert P. Goldberg. (1974). Formal requirements for virtualizable third generation architectures ↩
    
    
      How AWS’s Firecracker virtual machines work ↩
    
    
      AWS re:Invent 2019: Firecracker open-source innovation (OPN402) ↩
    
    
      Linux Capabilities and Seccomp ↩
    
    
      AWS re:Invent 2020: Deep dive into AWS Lambda security: Function isolation ↩
    
    
      Firecracker benchmarking code and data ↩