Categories

Sunday, May 14, 2017

With Cosmos DB, Microsoft wants to build one database to rule them all

“We want this to be the database of the future and to last for many decades to come,” Microsoft Technical Fellow Dharma Shukla told me when we talked about Cosmos DB, the new globally distributed database the company is launching at its Build developer conference today. And for Shukla, this is a project that started seven years ago when he started prototyping what a globally distributed (or “planet-scale,” as Microsoft often likes to call it) database would look like. This project (which at the time was called “Project Florence”) first turned into DocumentDB, Azure’s NoSQL database service, which launched in 2015 and is now morphing into Cosmos DB.
Cosmos DB is, in Shukla’s words, “a major leap forward” from what DocumentDB was able to offer. DocumentDB only offered a subset of the capabilities of what is now Cosmos DB. While DocumentDB was essentially a store for JSON data, Cosmos DB goes much further. It extends the idea of an index-free database system and adds support for various new data types that allows Cosmos DB enough flexibility to work as a graph database or key-value database, for example. And for those who are looking to store more traditional columnar relational data, Cosmos DB will also offer support for those.

All of this is driven by a mantra that you’ll almost inevitably hear when you talk to somebody from Microsoft’s developer division these days: “We want to meet developers wherever they are.” So while you could also use the MongoDB APIs to access your data in DocumentDB, Cosmos DB also features support for SQL, Gremlin and Azure Tables — and the team plans to launch a large number of similar driver and translation layers in the near future.

“No data is born relational,” Shukla told me. “In the real world, nobody thinks in terms of schemas — they think graphs or maybe JSON document if you’re an IOT device. […] We want to make sure that the systems we build have a common engine to efficiently map different data models.”
Given its heritage, it’s no surprise that Cosmos DB takes many of its cues from DocumentDB. One of those is the availability of tunable consistency models. If you don’t spend your days thinking about globally distributed databases, then consistency models may not exactly be all that important to you, but most competing database systems (including Google’s recently launched Cloud Spanner), only feature two consistency models: strong consistency and eventual consistency. With strong consistency, whenever data is written to the database, all the different nodes (which could be spread around data centers across the globe) have to agree on a new value before it becomes visible in an app, for example. That comes with some obvious performance trade-offs given the added latency of this. Eventual consistency is essentially a more lenient system where, over time, all the nodes don’t update simultaneously and instead come to agree on a value only after there hasn’t been any recent update for a while.

“What is unusual about Cosmos DB is that it provides different consistency models where the user gets a trade-off between how much consistency he gets over how much of a performance hit he takes,” Leslie Lamport, the Turing Award winner whose work underpins many of these concepts (and who also wrote the LaTeX document preparation system) and who joined Microsoft Research in 2001, told me. Cosmos DB offers three different flavors of consistency for different use cases. “Those kind of intermediary consistency guarantees sort of have been around in academic systems that people build to write papers around,” Lamport explained. Cosmos DB is among the first commercially available database systems to offer this variety of consistency models.
For Cosmos DB (and previously for DocumentDB), that means you can choose between a consistency model where reads are allowed to lag behind writes for only a certain amount of milliseconds, for example, or a model that focuses on offering consistency for a specific client session (in a Twitter-like app, for example), but where it’s not all that important that every user sees every write at the same time (or even in the exact same order).



As Shukla noted, though, his idea here was to build a database system that could last decades. To do this, he also brought in Lamport to teach the team TLA+. Lamport has long had a special interest in how developers spec out their applications. TLA+ is essentially a formal language for doing just that. “When we started out in 2010, we wanted to build a system — a lasting system. This was the database of the future for Microsoft,” Shukla told me. “We try to apply as much rigor to our engineering as we possibly can. […] TLA+ has been wonderful in getting that level of rigor in a team of engineers to set the bar high for quality.” TLA+, Lamport noted, allows you to do the high-level design of a system in a completely formal way — and because it’s done formally, it can be checked for correctness, too (and to be fair, AWS and others also use TLA+ to spec out their distributed systems). “I don’t want to give the impression that TLA+ is great and I’m brilliant. It’s great because it’s almost entirely based on mathematics,” Lamport added.
The Cosmos DB feels strongly enough about its engineering chops that it is also offering a number of SLAs that are somewhat unusual in the database space. While it’s typical to guarantee a certain degree of uptime, Microsoft also offers SLAs for throughput, consistency and latency. “The thing I’m super proud of — and we feel this is the legacy — it’s done with careful craftsmanship,” said Shukla.
It’s worth noting that, as is now a tradition at Microsoft, Cosmos DB has long been used internally at Microsoft. In its previous incarnation as DocumentDB, Cosmos DB currently serves “tens of thousands of customers,” Microsoft says, and stores petabytes of data.

Cosmos DB is now live in 34 Azure regions and is generally available, with all the SLA promises that entails. All existing DocumentDB customers (and their data) will automatically become Cosmos DB users.

Source: https://techcrunch.com/2017/05/10/with-cosmos-db-microsoft-wants-to-build-one-database-to-rule-them-all/

Sunday, April 2, 2017

9 lies programmers tell themselves

Programmers have pride with good reason. No one else has the power to reach into a database and change reality. The more the world relies on computers to define how the world works, the more powerful coders become.
Alas, pride goeth before the fall. The power we share is very real, but it’s far from absolute and it’s often hollow. In fact, it may always be hollow because there is no perfect piece of code. Sometimes we cross our fingers and set limits because computers make mistakes. Computers too can be fallible, which we all know from too much firsthand experience.

Of course, many problems stem from assumptions we programmers make that simply aren’t correct. They’re usually sort of true some of the time, but that’s not the same as being true all of the time. As Mark Twain supposedly said, “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”
Kevin Deldycke’s GitHub-hosted list of falsehoods that programmers believe is a good example of how disconnected cyberspace can be from reality. It’s a compendium that will only grow as others contribute their war stories. Consider it a good kick in the pants itemizing a thousand examples that say, in essence, “Remember, Caesar, thou art mortal.”

My favorite may be the list of falsehoods about phone numbers. If you think that saving a phone number for a person is as simple as putting seven or maybe 10 digits in a database, you’re mistaken. That works until it doesn’t because there are country codes, abandoned numbers, and more than a dozen gotchas that make it hard to do a good job keeping a list of phone numbers. Is it any wonder that there’s a smug smile of satisfaction on the faces of the Luddites who keep their phone lists in a little black book?
Here are a number of false beliefs that we programmers often pretend are quite true.

Questions have one answer

The database table is filled with columns, and each column has an entry or it doesn’t. It’s either full or null. What’s so hard about matching up an answer for every question?
Alas, sometimes there is more than one answer, then the table starts to fail. Maybe a person has more than one telephone number or a second weekend home. Database designers figured out some of the solutions to this by creating one-to-many and many-to-one mappings that can store multiple answers. Some of the more modern NoSQL solutions use a “document” model that lumps together all of the possible answers with different tags in one big soup.

These solutions are better, but even they have limits. Sometimes answers are valid only for a short window of time. A parking spot may be legal except during rush hour between 4 p.m. and 6 p.m.
If you think it’s enough to add one slot to the table to handle a window for each day, remember that sometimes there are several exceptions like 7 a.m. to 9 a.m. and 4 p.m. to 6 p.m. But don’t keep track of the time of day simply because the parking rules are often different on weekends, but then the definition of weekends changes. Parking is free in the District of Columbia on Sundays—but not Saturdays. Federal holidays are different too, and so are local ones.
Those are times only. The list of potential exceptions goes on and on making it impossible to imagine that a database will ever model reality by storing the absolute and final answer for any question no matter how simple.

Null is acceptable

Sometimes I think that half of the Java code I write is checking to see whether a pointer is null. When I’m feeling aggressive, I try to draw a perimeter around my library and test for null only at the entry methods, those locations where the API is open to the rest of the code. That simplifies things for a bit, but eventually I want to reach into the library and use a small method that’s sitting there. Oops. Now it needs to test for nullity and the perimeter has been breached. So much for building a wall.
Figuring out how to handle this issue is a big problem for modern language design. The clever way some languages use a question mark to check for nullity helps, but it doesn’t get rid of the issue. Null simply makes object-oriented programming much more confusing and prolix.

Human relationships can be codified

When gay marriage was legalized, one smart database administrator recognized that this was much bigger than the Y2K problem, which had almost paralyzed the country by asking the programmers to go back and add two new digits to the year. To solve this, the DBA considered how to handle the challenge with 14 progressively more accommodating database schema, each more elaborate than the last. In the end, he concluded, “Perhaps the simplest solution would be to ban marriage outright.”
But tracking who is married to whom is only the beginning. Imagine you’re building a database table for a school for determining which adult can pick up a kid after school or maybe make a decision about administering aspirin. Sure, the birth mother is easy, but what about the stepparents? What about the step older sibling who’s back from college break and definitely remembers meeting the kid at their parents’ wedding last summer, at least before the bar opened?
You might be tempted to pull a Facebook and punt with an “it’s complicated” entry, but you can’t. These are legal questions that can produce lawsuits if the code isn’t accurate. By “accurate,” I mean conforms to the law, and we all know how accurate Congress can be when writing laws. But forget about blaming Washington. The kid needs aspirin. What will your database say?

”Unicode” stands for universal communication

There’s an earnest committee that meets frequently trying to decide which emojis should be included in the definitive list of glyphs that define human communication. They also toss aside certain emoji, effectively denying someone’s feelings.
The explosion in memes shows how futile this process can be. If the world finds emojis too limiting, spurring them to turn to mixing text with photos of cultural icons, how can any list of emojis be adequate?
Then there’s the problem of emoji fonts. What looks cute and cuddly in one font can look dastardly and suspect in another. You can choose the cute emoji, and your phone will dutifully send the Unicode bytes to your friend with a different brand phone and a different font that will render the bytes with the dastardly version of the emoji. Oops.

Numbers are accurate

As I type this, snow is falling across the Sierras from one of the biggest storms in some time. When I looked at the weather today, it looks sunny and cool, a perfect day for skiing. But some of the slopes are closed. Why? The new snow might bring avalanches, and the slopes can’t be opened until the crew clears the danger with explosives.
The basic numbers from the weather report (temperature, cloud cover, humidity) don’t capture some of the special details. Avalanche scientists have more complicated models that do a good job of predicting when the snow will tumble, but the reality is that numbers tell only part of the story. This is why the ski companies send out teams to trigger potential avalanches just in case.
The computer industry’s infatuation with numbers has only gotten deeper as the buzzwords “big data” get more popular. The hard disks are filled with trillions of numbers, so there should be algorithms that can extract something intelligent from all of these numbers.
In reality, numbers tell only very specific things. They’re often quite useful, but they’re far from completely accurate.

Human language is consistent

One of the ways that developers punt is to put in a text field and let humans fill it with whatever they want. The open-ended comment sections are made for humans and rarely interpreted by algorithms, so they’re not part of the problem.
The real problem resides in structured fields with text. When my GPS wants me to choose a road named after a saint, it tells me to “turn onto Street Johns Road.” Road names with apostrophes also throw it for a loop. It’s common to see “St. John’s Road” spelled as “Saint Johns,” “St. Johns,” “Saint John’s,” and even the plural form: “Saint Johns.” The U.S. Post Office has a canonical list of addresses without extra characters, and it maintains an elaborate algorithm for converting any random address into the canonical form.

Time is consistent

It may feel like time keeps flowing at a constant rate—and it does, but that’s not the problem for computers. It’s the humans that mess up the rules and make a programmer’s life nasty. You may think there are 24 hours in every day, but you better not write your code assuming that will always be true. If someone takes off on the East Coast of the United States and lands on the West Coast, that day lasts 27 hours.
Time zones are only the beginning. Daylight saving time adds and subtracts hours, but on weekends that change from year to year. In 2000 in the United States, the shift occurred in April. This year, the country changed clocks on the second Sunday in March. In the meantime, Europe moves to “summer time” on the last Sunday in March.
If you think that’s the end of it, you might be a programmer tired of writing code. Arizona doesn’t go on daylight saving time at all. The Navajo Nation, however, is a big part of Arizona, and it does change its clocks because it’s independent and able to decide these things for itself. So it does.
That’s not the end. The Hopi Nation lies inside the Navajo Nation, and perhaps to assert its independence from the Navajo, it does not change its clocks.
But wait, there’s more. The Navajo have a block of land inside the Hopi Nation, making it much harder to use geographic coordinates to accurately track the time in Arizona alone. Please don’t ask about Indiana.

Files are consistent

It seems that merely remembering the data should be something a computer can do. We should be able to recover the bits even if the bits are filled with many logical, stylistic, orthographic, numerical, or other inconsistencies. Alas, we can’t even do that.

Whenever I ask my Mac to check the file system and fix mistakes, it invariably tells me about a long list of “permissions errors” that it dutifully repairs for me. How did the software get permission to change the permissions for access to my files if I didn’t give permission to do it? Don’t ask me.
The problems go deeper. About every six months, the built-in Mac backup software called Time Machine announces that the backup copy of everything has become corrupted and the only way to fix it is to rebuild the entire thing. Quick, though, before the main computer explodes and all the data is lost.
These are only two examples of how file systems don’t honor the compact between user (the person supplying the electricity) and the machine (desperate needer of electricity). Any programmer will tell you there are hundreds of other examples of situations where files don’t contain what we expect them to contain. Database companies are paid big bucks to make sure the data can be written in a consistent way. Even then, something goes wrong and the consultants get paid even more money to fix the tables that have gone south.

We’re in control

We like to believe that our instructions are telling the computer what to do and that arrogant pride is generally true except when it’s not.
What? Certainly that may not be true for the average nonprogramming saps, unwashed in the liniment of coding power, but not us, wizards of logic and arithmetic, right? Wrong. We’re all powerless beggars who are stuck taking whatever the machines give us. The operating system is in charge, and it may or may not let our code compute what it wants.
OK, what if we compile the Linux kernel from scratch and install only the code that we’ve vetted? Certainly we’re in control then.
Nope. The BIOS has first dibs over the computer, and it can surreptitiously make subtle and not-so-subtle changes in your code. If you’re running in the cloud, the hypervisor has even more power.
OK, what if we replace the BIOS with our own custom boot loader? You’re getting closer, but there’s still plenty of firmware buried inside your machine. Your disk drive, network card, and video card can all think for themselves, and they listen to their firmware first.
Even that little thumb drive has a built-in processor with its own code making its own decisions. All of these embedded processors have been caught harboring malware. The sad fact is that none of the transistors in that box under your desk report to you.

Friday, January 20, 2017

What's Next for .NET?

Key Takeaways

  • .NET is positioning itself for cross-platform development with .NET Core while .NET Standard 2.0 brings in the missing pieces
  • Streamlining the cross-platform tooling and educating the community to eliminate confusion is the next step to drive .NET Core and .NET Standard adoption.
  • Roslyn has a major impact on .NET, enabling new features to be delivered much faster. Roslyn also enables developers outside Microsoft to use to build their own tools based on its public APIs.
  • The .NET community is now warmed up to open source and increasingly contributing on compilers and system libraries.
A lot happened in the last year in the .NET ecosystem. Things are moving fast on several fronts: Xamarin, UWP, .NET Core, .NET native, F#, open source, etc.
Putting aside the details, the bigger picture is difficult to grasp. There is movement in all aspects: cross-platform, cloud, mobile, web apps and universal apps. Developers wonder where all this is going to lead and what will be required to get there.

The panelists:

  • Richard Lander - Principal Program Manager Lead on the .NET Team at Microsoft
  • Phillip Carter - Program Manager on the .NET team at Microsoft
  • Phil Haack - Engineering Director at GitHub
  • Miguel de Icaza - Distinguished Engineer at Microsoft
InfoQ: Where are .NET and its languages going and what are the challenges ahead?
Richard Lander: You can see the future of .NET by looking at the wide breadth of device and operating system support that .NET has today, including recent additions from .NET Core. You can build any kind of application with .NET, including mobile, web, desktop and IoT. What’s interesting about .NET is that it is in a very small group of development platforms that can run natively on many platforms, has highly productive and evolving languages and tools and Enterprise Support. This, in short, is the vision we have for .NET.
The choice to open source .NET Core has had a huge impact on .NET. We’ve seen large numbers of open source developers getting involved with .NET Core and related projects and seen a big upswing in general .NET activity on GitHub. We’ve also been surprised by significant corporate interest and engagement. A number of big and important companies have joined the .NET Foundation, like Samsung and Google. Something truly interesting is going on when you see other companies saying “.NET is important for our business … we need to get more involved.” You’ll see us continue to be open and collaborative and increase the ways in which we are doing that.
One of the big surprises in 2016 was the introduction of Visual Studio for Mac. It includes tools for both Xamarin and ASP.NET Core. The Visual Studio for Mac product is a very clear signal that Microsoft is serious about cross-platform development. We also have free tools options for Windows, Mac and Linux, making it super easy to get started with .NET.
The challenge is getting people to recognize that .NET is no longer Windows-only and that it has transitioned to a credible cross-platform development option that should be considered for your next project. We’ve made some huge changes in the last couple years, such as acquiring Xamarin, open sourcing .NET Core and building great cross-platform tools support. We have work left to do to earn people’s interest in the product, and that’s a key focus moving forward.
Philip Carter: For .NET, the biggest focus right now is .NET Standard Library 2.0 and having as great of an experience as possible using container technologies, like Docker.  .NET Standard Library 2.0 makes the vast majority of .NET APIs cross-platform, and gives developers a simple way to reason about the code they write.  If the only .NET APIs you take a dependency on are from the .NET Standard Library, your code is guaranteed to run anywhere that a .NET runtime does, with no extra work on your behalf.  This is the same for NuGet packages as well – if the dependency graph of your system ultimately depends on the .NET Standard Library, it will run everywhere.  That’s huge from a code-sharing point of view, and even more important for long-term flexibility.  Need to target Linux?  All your code which uses the .NET Standard Library runs there.  We’re also taking containers very seriously.  We want your experience with deploying code to a container to be as simple as possible, and we’re building the tooling to make that happen.
For the .NET languages, we’re focused on making tooling for our languages as good as possible out of the box.  We’re shipping some great productivity features in the forthcoming release of Visual Studio, and we’re focused on building even more in the future.  In terms of language features, C# and Visual Basic are focusing on continuing to add more of the functional programming features already found in F#, such as expression-based Pattern Matching and Record and Discriminated Union types, modified in ways which make sense.  Non-nullability is also a huge area of interest for us.  F# is specifically focusing more on better IDE tooling, as it already has these previously-mentioned features, but lacks the same quality tooling experience that C# and Visual Basic have.  In short, more features which continue to highlight functional programming, and better tooling for each language.
The biggest challenges ahead lay in the sheer amount of work involved to release all the above as release-quality, supported software.
Phil Haack: C# is headed in a great direction. Now that the design is done in the open on GitHub, the community can follow on and contribute to its future. It's been adding a lot of features inspired by functional programming that'll make coding in it more delightful. F# continues to be a wonderful functional language that inspires a lot of features of C#, but seems to have trouble breaking into the mainstream.
.NET itself is headed towards being a compelling cross-platform choice for development, but it faces challenges in maintaining relevance and growth. While the number of jobs for C# developers are high, the amount that it's taught in schools and bootcamps and code academies seems small compared to Node and JavaScript. While the .NET ecosystem is strong and growing, it's still heavily dependent on Microsoft. There seem to be few large companies contributing to their OSS projects for example. And the number of packages in NPM dwarf NuGet. Expanding the independent community is important.
Another challenge is that at the end of the day, the lingua franca of the web is JavaScript. So in terms of being a cross-platform language, Node and JavaScript have a huge appeal because it's one less language to understand when building web applications. This explains the appeal of platforms like Electron where you can bring many of the web development skills you may already have over to native application development. Thus C# and F# have to make a compelling case to learn yet another language (JS in the front, C# in the back).
Miguel de Icaza: .NET continues a tradition that was set when it was first introduced. It is a framework that continuously evolves to match the needs of developers, that continues to have a strong interoperability story and that strives to blend productivity and performance at the same time.
Today, .NET is now available on pretty much every available platform in use: from servers, to desktops to mobile devices, gaming consoles, virtual and augmented reality environments, to watches and even tiny embeddable system like the Raspberri-Pi and similar systems.
The entire framework has been open sourced under the most liberal terms possible which opens many doors - from becoming a core component of future Unix systems, to secret new devices being manufactured by the industry.
The blend of productivity and performance is one that is very important to me, as I first started working with .NET back in 2000 when computers merely had a fraction of the power that today's computers have yet it delivered a high performance runtime that assisted developers in creating robust software.  This was done by ensuring that a safe programming environment existed, one where common programming mistakes were avoided by design.
This blend has proved to be incredibly useful in this world where we carry portable computers in our pockets and for game developers. Developers still want to create robust software, at a fraction of the time, with a fraction of the support, but running on devices that do not have as much power as high end computer.
As for challenges, these are probably the most interesting ones and I want to share examples on how the framework evolves along the lines that I outlined at the start.
Like I mentioned, one of the cultural strengths of .NET is that we have adapted the framework over the years to match the needs of the market and these change constantly.  From work that needs to be done at the very local level (for example runtime optimizations) to the distributed level (higher level frameworks).
In previous years we had to shrink the framework to fit in underpowered devices and we created smart linkers, smarter code generation and created APIs that mapped to new hardware.
And this trend continues.  One example is the focus in the past year to enhance .NET to empowering a nascent class of users that develop high performance server and client code.  This requires the introduction of new types, primitives and compiler optimizations in the stack.   On the other end of the spectrum, it is now simpler for .NET developers to create distributed systems, both with Microsoft authored technology (Orleans, ServiceFabric), or with community authored technology (MBrace).
On the interoperability side of the house, we have been working on various fronts.  We are working to make it simpler for .NET programmers to consume code written in other frameworks and languages as well as making it simple to consume .NET code from other languages (we already support first class C, C++, Java and Objective-C) as well as making it easier to communicate with services across the network with tools like the Azure AutoRest.
InfoQ: How has the emergence of Roslyn helped the growth of the .NET platform and your language? (C# / F# / VB .NET as appropriate)
Richard Lander: This is a really easy one, and I'm going to bend the question a bit to enable me to include the runtime in the equation. If you think of the .NET runtime, it enables these languages and Roslyn to exist, given the model (mostly garbage collection and type-safe memory) that they expose to developers. So, the absence of that would be C++ (ignoring other industry peers for the moment). The developers on the runtime team work in C++  so that Roslyn can exist and you can use C#. That is very charitable of them!
csc.exe (the pre-Roslyn C# compiler) was also written in C++, so the same model applies there.
It turns out that the developers who write the native components of the platform like C# better. News flash, eh? They actively find ways to do more of their job in and convert more of their codebase to C#. It's a massive over-simplification, but you can think of Roslyn solely as a project to rewrite csc.exe in C#. At the same time, there has been an equally significant trend to rewrite runtime components in C#, too. Particularly for the runtime, it's a significant architectural effort to convert runtime components to C# since you have a bootstrapping problem, but it's worth it.
A C# code-base is hugely beneficial over C++ for a few reasons:
  • It vastly increases the size of the developer base that can contribute to the codebase.
  • We have excellent tools for C# that make development much more efficient.
  • It's straightforward to make a .NET binary work on other chips and operating systems. This is great for bringing up a codebase like .NET Core on something like Raspberry Pi.
  • It's easier to reason about security issues.
So, in my view, the primary trend is moving our existing C++ codebase to C#. It makes us so much more efficient and enables a broader set of  .NET developers to reason about the base platform more easily and also contribute.
Philip Carter: Roslyn has been big for us in growing .NET, helping us at Microsoft build better tools and offering developers a new platform to build a new class of tooling with deep, semantic understanding of their codebase.
From a language developer’s perspective, one of the immediate benefits of Roslyn is a modern architecture which allows for adding new language features far more easily than the previous compilers for C# and Visual Basic.  Roslyn also introduced Roslyn Workspaces, which is a cross-platform editor abstraction layer.  This is now used in Visual Studio and Visual Studio Code (via OmniSharp) to more easily utilize each language service.  Additionally, F# 4.1 will be the first version of F# which uses Roslyn Workspaces with its own language service, which opens the doors to a vast amount of IDE tooling improvements and new features.  This can position F# as the only functional programming language on the market with quality, first-class IDE tooling, which we believe will help grow .NET.  Roslyn Workspaces are the vehicle that allow us to ship better language features for all .NET languages.
Roslyn Analyzers help grow .NET by offering a set of APIs that allow you to build a new class of custom tooling for your C# and VB codebases.  The first improvement here is enabling people to build powerful static analysis tooling more easily, but you can take it a step further with something like a semantic diff tool, or other things which require an understanding of the semantic structure of your code.  This is a vector which, prior to Roslyn, was realistically only available for those who made money off static analysis tooling.  With Roslyn Analyzers, this area of development is now approachable and available to any .NET developer.
Phil Haack: It's had a huge impact. Prior to Roslyn, the idea of implementing a code analyzer was limited to a few who dared delve into the esoteric machinations necessary. It's democratized enhancing your compiler. It also has paved the way for people to be involved in language design. It's one thing to suggest a new language feature. It's another to also submit a pull request with the feature implemented so that others can try it out and see what they think.
Miguel de Icaza: It has helped unify the C# experience across all of our supported platforms: Visual Studio, VSCode and Xamarin Studio but also new development technologies that make it easy to write live code and try live code with Xamarin Workbooks or Continuous on the iPad.
For tools, developers had to resort to half-built, half-cooked language implementations or hacks.   In the C# world this is no longer necessary, as we now have a way for developers to fully grasp a C# program, manipulate it and explore it in the same way the language would.
F# is on a league of its own, it combines beautifully the world of .NET and functional programming into one.   It is a language that has always been ahead of the curve and great for data processing.  At a time where interest in machine learning is at an all time high, it is just what the doctor ordered.
InfoQ: How should developers approach the official .NET platform, the .NET Core platform, and the Mono stacks? Is there a way to "keep up"? What should be used for new projects?
Richard Lander: A major focus for 2017 is reducing the number of things you need to keep up with. Initially, we made design choices with .NET Core that made it significantly different from other .NET platforms.  Since then, we have reverted those choices, making .NET Core much more similar to the rest of .NET. In the Visual Studio 2017 release, you will see .NET Core move to using the msbuild build engine, just like .NET Framework and Xamarin. That makes the experience of making .NET mobile and web projects work together, for example, a lot easier.
We are also standardizing the minimum set of APIs that all .NET platforms must have. We’re calling that “.NET Standard”. We’ll define .NET Standard 2.0 in 2017 and ship various implementations of it. It’s the biggest set of common APIs that we’ve ever defined and shipped by a large margin. It’s twice as big as the largest Portable Class Library profile. Once .NET Standard 2.0 is implemented by all the .NET platforms, many things get a lot easier. Many people are looking forward to that. It’s a game changer.
Philip Carter: This is an important question to answer, especially since we’ve been building so many different things over the years. Here’s the way I like to frame it:
.NET is a cross-platform development stack.  It has a standard library, called the .NET Standard Library, which contains a tremendous number of APIs.  This standard library is implemented by various .NET runtimes - .NET Framework, .NET Core, and Xamarin-flavored Mono.
.NET Framework is the same .NET Framework existing developers have always used.  It implements the .NET Standard Library, which means that any code which depends only on the .NET Standard Library can run on the .NET Framework.  It contains additional Windows-specific APIs, such as APIs for Windows desktop development with Windows Forms and WPF.  .NET Framework is optimized for building Windows desktop applications.
.NET Core is a new, cross-platform runtime optimized for server workloads.  It implements the .NET Standard Library, which means that any code which uses the .NET Standard Library can run on .NET Core.  It is the runtime that the new web development stack, ASP.NET Core, uses.  It is modern, efficient, and designed to handle server and cloud workloads at scale.
Xamarin-flavored Mono is the runtime used by Xamarin apps.  It implements the .NET Standard Library, which means that any code which depends only on the .NET Standard Library can run on Xamarin apps.  It contains additional APIs for iOS, Android, Xamarin.Forms, and Xamarin.Mac.  It is optimized for building mobile applications on iOS and Android.
Additionally, the tooling and languages are common across all runtimes starting with the forthcoming version of Visual Studio.  When you build a .NET project of any kind, you will use a new, but compatible project system that existing .NET Developers have used for years.  MSBuild is used under the covers for building projects, which means that existing build systems can be used for any new code, and any .NET Standard or .NET Core code can be used with existing .NET Framework code.  All the .NET languages compile and run the same way across each runtime.
What should you use for new projects?  It depends on your needs.  Windows desktop application?  .NET Framework, just like you’ve always used it.  Server or Web application?  ASP.NET Core, running on .NET Core.  Mobile application?  Xamarin.  Class Libraries and NuGet packages?  .NET Standard Library.  Using the standard library is critical for sharing your code across all your applications.
The best way to keep up is to keep your eyes on the official .NET Documentation.  The official .NET Blog also has numerous posts about this, but you’ll have to understand that many historical posts no longer describe the .NET landscape accurately.
Phil Haack: I tend to be fairly conservative. For production projects that my business depends on, I'd focus on the tried and true official .NET platform. But for my next side project, I'm definitely using .NET Core. It doesn't suffer the baggage of over a decade of backward compatibility requirements.
Miguel de Icaza: Thanks to .NET Standard, and especially the APIs we're delivering in .NET Standard 2.0, developers should not need to think too much about which runtime is running their app.  Those that keep up with the internals of .NET may be interested in understanding how we ended up with runtimes that are optimized to certain use cases (for instance, the years that have gone into optimizing Mono for mobile and games), but for the most part, developers just need to know that wherever they go, we've got them covered.
When it comes to choosing a runtime, the way to think about them is as follows:
  • .NET Framework is a Windows-centric framework that surfaces the best of Windows to developers. If you are building a Windows-centric application, this is what you will be using.
  • .NET Core is the cloud optimized engine and it is cross platform. It uses the same high-performance JIT compiler but runs your code on all the supported operating systems (Windows, Linux, macOS).  It does not ship with Windows specific APIs, as they would defeat the cross-platform objective.
  • Mono is the runtime used for mobile and Apple platforms (Android, iOS, watchOS, tvOS), gaming consoles and Unix desktop applications.
InfoQ: What features of other languages do you admire and would consider for (C# / F# / VB .NET as appropriate)
Richard Lander: One of the things I like about JavaScript (and other languages like it) is that you can just start writing code in a file and you have something runnable with a single line. There is no ceremony and no real concepts to learn (at first). That's valuable. There are some C# scripting solutions that are like that, too, but they are not well integrated. Certain language features are headed in this direction, but we're not there yet. I'd like to be able to have a single line C# file for a "Hello World" Web API. That would be awesome.
Being a runtime guy, I'm going to bend this question to the runtime again. I like JavaScript and PHP, for example, because they can be read and executed quickly from source. I also like Go because it produces single file native executable. .NET is one of very few platforms that can reasonably do both. I'd like to see us expose both of those options for .NET developers. It’s easy to see scenarios, particularly for Cloud programming, where both options can be beneficial.
Philip Carter: One of the biggest features we admire for C# and Visual Basic is non-nullability.  One of the biggest problems out there, often coined the “billion-dollar mistake”, is null.  Every .NET developer out there has had to chase down bugs where they hadn’t properly checked for null in their codebase.  The ability to mark types as non-nullable eliminates that problem, allowing you to push concerns about null from a runtime problem to a compile-time problem.  Making this a reality is a serious challenge, however, because the corpus of C# and Visual Basic code out there today does not have non-nullability.  Thus, such a feature might have to be opt-in and cannot break existing code.  This is actually not much of a problem in F#, where the F# types you use in your codebase are already non-nullable by default, and nullability/non-nullability is already a language feature.  However, when interoperating with other .NET types, null is still a concern for F# developers.  We’re keenly interested in opt-in non-nullability.
Another set of language features that are interesting are those which enable better ad-hoc polymorphism; namely, Protocols/Traits and Type Classes.  These allow you to extend the functionality of existing types, and Type Classes in particular are even more flexible because they allow you to define behavior without needing to “pin” it to a particular type.  This makes things like equality semantics for an arbitrary type hierarchy much simpler than it is today.  While something like Protocols/Traits or Type Classes aren’t on our roadmap, they’re certainly interesting and do solve some of the more nuanced problems you can encounter with .NET languages today.
Miguel de Icaza: I am not responsible for the evolution of those languages, but as a user, I have a list of features that I would like C# and F# to incorporate.
For F#, my request is simple: I am a man that is too attached to his old ways of writing loops.   I want my loops to include support for break and continue. I know this is heresy, but that is what I desire the most :-)
For C#, I would like the language to continue incorporating many of the F# ideas, and in particular, I would like for it to introduce non-nullable references.
InfoQ: The open source shift at Microsoft have been underway for over a year. In what ways did it influence or change the .NET community?
Richard Lander: The open source shift of .NET began in 2008 releasing ASP.NET MVC source code, then Web API and SignalR and finally open sourcing all of .NET Core in 2015. This has been a journey and now over 50% of all .NET Core changes are coming from the community and the number of C# repos grows every day on GitHub. This is a sea change from the past and paints a very nice picture for the future.
As open source maintainers, we try to do a great job. That breaks out a few ways: being welcoming to newcomers, providing decent repo documentation and instructions, holding a high bar on PRs and enabling community leaders to rise up and help run the project. In some cases, we've met the community’s expectations and in other cases we have conversations going on where they would like something different. In general, I think it is fair to say that the community is happy (and in many cases very happy) with how the .NET Core and related open source projects have worked out over the last couple years.
A positive and healthy open source platform project for the .NET ecosystem is hugely beneficial. It encourages more open source projects from a broader set of people. Effectively, it creates a different tone that wasn't possible before just because you can now say that .NET is an open source ecosystem. The feedback has been very positive, from the community, from small and big business and from the public sector.
Philip Carter: I think that the shift to open source has done a tremendous amount of good for the .NET community.  I think the greater .NET community is still warming up to open source, and might be a bit jarred by our sudden transition, but people are already seeing the value and contributing.  We’ve seen an uptake in community contributions across the board – even in our documentation, which is not normally associated with open source development, we have almost 100 non-Microsoft contributors.  One community member even reviews pull requests that Microsoft employees make on our own repository!  I am personally noticing a .NET community that feels empowered to contribute to their development stack, and I couldn’t be more excited about it.  This is only the beginning, too.  As we launch release-quality tooling in Visual Studio for .NET Standard and .NET Core (which is open source as well, by the way), I expect an increasing number of .NET developers watching and contributing to .NET.
Phil Haack: I may be biased, but it's been underway for more than a year. It's only in the past year that it passed the tipping point. I think it's showing the community that Microsoft is dead serious about being a real open source player. It may still take time for that to sink in. After all, MS has been heavily antagonistic to open source for much longer than one year. But I think it's a net positive. The .NET community can now actively participate in the future of .NET in a manner that wasn't possible before. All of the new .NET core development and language design is being done in the public on GitHub and they accept contributions. This is good to see. More and more the community feels like a part of the effort and not a sideshow afterthought.
Miguel de Icaza: It has energized both the open source .NET community as well as those that merely consume .NET. The benefits of open source for a framework are in full display with the opening of the framework and you can see contributions to the codebase across the board, from performance to memory usage, to improved precision to scalability and so on. There is a virtuous cycle in full display right now, as we nurture and grow the .NET community together.

Conclusion

Open source in .NET is now part of the landscape and get be expected to continue growing with .NET Core and .NET Standard 2.0. Microsoft focuses its efforts on the cross-platform story, polishing the platform to appeal to non-Windows developers and platform implementers.

Thursday, January 19, 2017

.NET Core Image Processing

Image processing, and in particular image resizing, is a common requirement for web applications. As such, I wanted to paint a panorama of the options that exist for .NET Core to process images. For each option, I’ll give a code sample for image resizing, and I’ll outline interesting features. I’ll conclude with a comparison of the performance of the libraries, in terms of speed, size, and quality of the output.

CoreCompat.System.Drawing

If you have existing code relying on System.Drawing, using this library is clearly your fastest path to .NET Core and cross-platform bliss: the performance and quality are fine, and the API is exactly the same. The built-in System.Drawing APIs are the easiest way to process images with .NET Framework, but they rely on the GDI+ features from Windows, which are not included in .NET Core, and are a client technology that was never designed for multi-threaded server environments. There are locking issues that may make this solution unsuitable for your applications.
CoreCompat.System.Drawing is a .NET Core port of the Mono implementation of System.Drawing. Like System.Drawing in .NET Framework and in Mono, CoreCompat.System.Drawing also relies on GDI+ on Windows. Caution is therefore advised, for the same reasons.
Also be careful when using the library cross-platform, to include the runtime.osx.10.10-x64.CoreCompat.System.Drawing and / or runtime.linux-x64.CoreCompat.System.Drawing packages.


using System.Drawing;


const int size = 150;

const int quality = 75;


using (var image = new Bitmap(System.Drawing.Image.FromFile(inputPath)))

{

int width, height;

if (image.Width > image.Height)

{

width = size;

height = Convert.ToInt32(image.Height * size / (double)image.Width);

}

else

{

width = Convert.ToInt32(image.Width * size / (double)image.Height);

height = size;

}

var resized = new Bitmap(width, height);

using (var graphics = Graphics.FromImage(resized))

{

graphics.CompositingQuality = CompositingQuality.HighSpeed;

graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;

graphics.CompositingMode = CompositingMode.SourceCopy;

graphics.DrawImage(image, 0, 0, width, height);

using (var output = File.Open(

OutputPath(path, outputDirectory, SystemDrawing), FileMode.Create))

{

var qualityParamId = Encoder.Quality;

var encoderParameters = new EncoderParameters(1);

encoderParameters.Param[0] = new EncoderParameter(qualityParamId, quality);

var codec = ImageCodecInfo.GetImageDecoders()

.FirstOrDefault(codec => codec.FormatID == ImageFormat.Jpeg.Guid);

resized.Save(output, codec, encoderParameters);

}

}

}

ImageSharp

ImageSharp is a brand new, pure managed code, and cross-platform image processing library. Its performance is not as good as that of libraries relying on native OS-specific dependencies, but it remains very reasonable. Its only dependency is .NET itself, which makes it extremely portable: there is no additional package to install, just reference ImageSharp itself, and you’re done.
If you decide to use ImageSharp, don’t include the package that shows on NuGet: that’s going to be an empty placeholder until the first official release of ImageSharp ships. For the moment, you need to get a nightly build from a MyGet feed. This can be done by adding the following NuGet.config to the root directory of the project:


<?xml version="1.0" encoding="utf-8"?>

<configuration>

<packageSources>

<add key="ImageSharp Nightly" value="https://www.myget.org/F/imagesharp/api/v3/index.json" />

</packageSources>

</configuration>
Resizing an image with ImageSharp is very simple.


using ImageSharp;


const int size = 150;

const int quality = 75;


Configuration.Default.AddImageFormat(new JpegFormat());


using (var input = File.OpenRead(inputPath))

{

using (var output = File.OpenWrite(outputPath))

{

var image = new Image(input)

.Resize(new ResizeOptions

{

Size = new Size(size, size),

Mode = ResizeMode.Max

});

image.ExifProfile = null;

image.Quality = quality;

image.Save(output);

}

}
view raw ImageSharp.cs hosted with ❤ by GitHub
For a new codebase, the library is surprisingly complete. It includes all the filters you’d expect to treat images, and even includes very comprehensive support for reading and writing EXIF tags (that code is shared with Magick.NET):


var exif = image.ExifProfile;

var description = exif.GetValue(ImageSharpExifTag.ImageDescription);

var yearTaken = DateTime.ParseExact(

(string)exif.GetValue(ImageSharpExifTag.DateTimeOriginal).Value,

"yyyy:MM:dd HH:mm:ss",

CultureInfo.InvariantCulture)

.Year;

var author = exif.GetValue(ImageSharpExifTag.Artist);

var copyright = $"{description} (c) {yearTaken} {author}";

exif.SetValue(ImageSharpExifTag.Copyright, copyright);
view raw ImageSharpExif.cs hosted with ❤ by GitHub
Note that the latest builds of ImageSharp are more modular than they used to, and if you’re going to use image formats such as Jpeg, or image processing capabilities such as Resize, you need to import additional packages in addition to the core ImageSharp package (respectively ImageSharp.Processing and ImageSharp.Formats.Jpeg).

Magick.NET

Magick.NET is the .NET wrapper for the popular ImageMagick library. ImageMagick is an open-source, cross-platform library that focuses on image quality, and on offering a very wide choice of supported image formats. It also has the same support for EXIF as ImageSharp.
The .NET Core build of Magick.NET currently only supports Windows. The author of the library, Dirk Lemstra is looking for help with converting build scripts for the native ImageMagick dependency, so if you have some expertise building native libraries on Mac or Linux, this is a great opportunity to help an awesome project.
Magick.NET has the best image quality of all the libraries discussed in this post, as you can see in the samples below, and it performs relatively well. It also has a very complete API, and the best support for exotic file formats.


using ImageMagick;


const int size = 150;

const int quality = 75;


using (var image = new MagickImage(inputPath))

{

image.Resize(size, size);

image.Strip();

image.Quality = quality;

image.Write(outputPath);

}
view raw Magick.NET.cs hosted with ❤ by GitHub

SkiaSharp

SkiaSharp is the .NET wrapper for Google’s Skia cross-platform 2D graphics library, that is maintained by the Xamarin team. SkiaSharp is now compatible with .NET Core, and is extremely fast. As it relies on native libraries, its installation can be tricky, but I was able to make it work easily on Windows and macOS. Linux is currently more challenging, as it’s necessary to build some native libraries from code, but the team is working on ironing out those speedbumps, so SkiaSharp should soon be a very interesting option.


using SkiaSharp;


const int size = 150;

const int quality = 75;


using (var input = File.OpenRead(inputPath))

{

using (var inputStream = new SKManagedStream(input))

{

using (var original = SKBitmap.Decode(inputStream))

{

int width, height;

if (original.Width > original.Height)

{

width = size;

height = original.Height * size / original.Width;

}

else

{

width = original.Width * size / original.Height;

height = size;

}


using (var resized = original

.Resize(new SKImageInfo(width, height), SKBitmapResizeMethod.Lanczos3))

{

if (resized == null) return;


using (var image = SKImage.FromBitmap(resized))

{

using (var output =

File.OpenWrite(OutputPath(path, outputDirectory, SkiaSharpBitmap)))

{

image.Encode(SKImageEncodeFormat.Jpeg, Quality)

.SaveTo(output);

}

}

}

}

}

}
view raw SkiaSharp.cs hosted with ❤ by GitHub

FreeImage-dotnet-core

This library is to the native FreeImage library what Magick.NET is to ImageMagick: a .NET Core wrapper. It offers a nice choice of image formats, good performance, and good visual quality. Cross-platform support at this point is not perfect however, as I was unable to save images to disk on Linux and macOS. Hopefully that is fixed soon.


using FreeImageAPI;


const int size = 150;


using (var original = FreeImageBitmap.FromFile(path))

{

int width, height;

if (original.Width > original.Height)

{

width = size;

height = original.Height * size / original.Width;

}

else

{

width = original.Width * size / original.Height;

height = size;

}

var resized = new FreeImageBitmap(original, width, height);

// JPEG_QUALITYGOOD is 75 JPEG.

// JPEG_BASELINE strips metadata (EXIF, etc.)

resized.Save(OutputPath(path, outputDirectory, FreeImage), FREE_IMAGE_FORMAT.FIF_JPEG,

FREE_IMAGE_SAVE_FLAGS.JPEG_QUALITYGOOD |

FREE_IMAGE_SAVE_FLAGS.JPEG_BASELINE);

}
view raw FreeImage.cs hosted with ❤ by GitHub

MagicScaler

MagicScaler is a Windows-only library that relies on Windows Image Components (WIC) for handling the images, but applies its own algorithms for very high quality resampling. It’s not a general purpose 2D library, but one that focuses exclusively on image resizing. As you can see in the gallery below, the results are impressive: the library is extremely fast, and achieves unparalleled quality. The lack of cross-platform support is going to be a showstopper to many applications, but if you can afford to run on Windows only, and only need image resizing, this is a superb choice.


using PhotoSauce.MagicScaler;


const int size = 150;

const int quality = 75;


var settings = new ProcessImageSettings() {

Width = size,

Height = size,

ResizeMode = CropScaleMode.Max,

SaveFormat = FileFormat.Jpeg,

JpegQuality = quality,

JpegSubsampleMode = ChromaSubsampleMode.Subsample420

};


using (var output = new FileStream(OutputPath(path, outputDirectory, MagicScaler), FileMode.Create))

{

MagicImageProcessor.ProcessImage(path, output, settings);

}
view raw MagicScaler.cs hosted with ❤ by GitHub

Performance comparison

The first benchmark loads, resizes, and saves images on disk as Jpegs with a a quality of 75. I used 12 images with a good variety of subjects, and details that are not too easy to resize, so that defects are easy to spot. The images are roughly one megapixel JPEGs, except for one of the images that is a little smaller. Your mileage may vary, depending on what type of image you need to work with. I’d recommend you try to reproduce these results with a sample of images that corresponds to your own use case.
For the second benchmark, an empty megapixel image is resized to a 150 pixel wide thumbnail, without disk access.
The benchmarks use .NET Core 1.0.3 (the latest LTS at this date) for CoreCompat.System.Drawing, ImageSharp, and Magick.NET, and Mono 4.6.2 for SkiaSharp.
I ran the benchmarks on Windows on a HP Z420 workstation with a quad-core Xeon E5-1620 processor, 16GB of RAM, and the built-in Radeon GPU. For Linux, the results are for the same machine as Windows, but in a 4GB VM, so lower performance does not mean anything regarding Windows vs. Linux performance, and only library to library comparison should be considered meaningful. The macOS numbers are on an iMac with a 1.4GHz Core i5 processor, 8GB of RAM, and the built-in Intel HD Graphics 5000 GPU, running macOS Sierra.
Results are going to vary substantially depending on hardware: usage and performance of the GPU and of SIMD depends on both what’s available on the machine, and on the usage the library is making of it. Developers wanting to get maximum performance should further experiment. I should mention that I had to disable OpenCL on Magick.NET (OpenCL.IsEnabled = false;), as I was getting substantially worse performance with it enabled on that workstation than on my laptop.

LibraryLoad, resize, save (ms per image)Resize (ms per image)
CoreCompat.System.Drawing34 ± 116.0 ± 0.6
ImageSharp45 ± 110.1 ± 0.2
Magick.NET56 ± 224.1 ± 0.3
SkiaSharp20 ± 17.8 ± 0.1
FreeImage39 ± 112.9 ± 0.1
MagicScaler20 ± 1n/a
For both metrics, lower is better.
Image Resizing Performance (macOS)
LibraryLoad, resize, save (ms per image)Resize (ms per image)
CoreCompat.System.Drawing93 ± 171.5 ± 0.3
ImageSharp70.3 ± 0.327.6 ± 0.2
SkiaSharp15.9 ± 0.17.6 ± 0.1
FreeImagen/a12.8 ± 0.1
For both metrics, lower is better.
Image Resizing Performance (Linux)
LibraryLoad, resize, save (ms per image)Resize (ms per image)
CoreCompat.System.Drawing114 ± 592 ± 1
ImageSharp384 ± 5128 ± 1
FreeImagen/a29.7 ± 2
For both metrics, lower is better.
File Size
LibraryFile Size (average kB per image)
CoreCompat.System.Drawing4.0
ImageSharp4.0
Magick.NET4.2
SkiaSharp3.6
FreeImage3.6
MagicScaler4.3
Lower is better. Note that file size is affected by the quality of the subsampling that’s being performed, so size comparisons should take into account the visual quality of the end result.

Quality comparison

Here are the resized images. As you can see, the quality varies a lot from one image to the next, and between libraries. Some images show dramatic differences in sharpness, and some moirĂ© effects can be seen in places. You should make a choice based on the constraints of your project, and on the performance vs. quality trade-offs you’re willing to make.