|
Thanks to NumSharp's great new array slicing capabilities, the .NET community is one step closer to having a powerful open-source machine learning platform. Python is a machine learning language in part because it has great libraries like NumPy and TensorFlow. However, C# developers also have a great need for powerful open-source libraries for machine learning and data science. NumSharp, the NumPy C# port of the SciSharp STACK organization, has recently taken a big step forward by fully implementing slicing capabilities, allowing the creation of arbitrary subsets of N-dimensional arrays as efficient views of raw data. This makes it a useful tool for using C# for machine learning in conjunction with TensorFlow.NET.
What's the big deal?
If you haven't used NumPy, you probably don't know how great slicing is. Python arrays allow returning a slice of an array by indexing a series of elements, as follows: a[start:stop:step]. But only with NumPy's complex array implementation does slicing become a truly powerful data manipulation technique without which machine learning or data science would be unimaginable. Fortunately, for those who can't or don't want to switch to Python for machine learning (which I made too), NumSharp brings this capability to the .NET world. As one of the developers at NumSharp, I've introduced you to some important slicing use cases with sample code snippets in C#. Note that indexing cannot be done in C# in the same way as in Python due to differences in language syntax. However, we decided to keep the Python syntax for slice definitions, so we used strings to index slices in C#. Check out this example to see how close NumSharp is to NumPy.
Cut out the column from the matrix in Python/NumPyWhen written in C# with NumSharp, the code is almost identical. Note that slices are indexed slightly differently using strings as parameters for the indexer.
Cut out columns from a matrix in C#/NumSharpAs you can see, the NumSharp team has put a lot of effort into making the code as similar to Python as possible. This is very important because this way, existing Python code that relies on NumPy can now be easily ported to C#.
Use case: Use multiple views of the same data
Being able to pass only local parts of the underlying data (i.e., small chunks of large images) in and out of functions without copying is critical for runtime performance, especially for large data sets. Slices are indexed using local coordinates, so your algorithm doesn't need to know the global structure of your data, effectively simplifying your life and ensuring maximum performance because unnecessary duplication is avoided.
Use cases: Sparse views and recursive slicing
A sparse view of an array can be created by specifying steps beyond the beginning and end of the slice range. As far as I know, even C# 8.0 with the new array slice syntax can't do this. This feature becomes very important when dealing with interleaved data. You can design your algorithm to handle continuous data and provide it with sparse slices that mimic continuous data sources, minimizing the complexity of your algorithm.
Slicing can be sliced further, which is a very important feature if you are dealing with high-dimensional data. This also helps reduce the complexity of the algorithm, as you can reduce the dimensionality of the data by recursively slicing.
Use case: Efficiently process high-dimensional data
If you need to think of an array of data as a volume and work with its parts without having to do incredible coordinate transformation calculations, then .reshape() is your friend. All arrays created by or slicing operations are .reshape() just a view of the original data. When you iterate, read, or write elements to a view, you access the raw data array. NumSharp transparently performs the appropriate index transformations for you, so you can index slices with relative coordinates.
Use case: Reverse the order of elements at no additional cost
Slices using negative steps are actually reversing the order of the slices. The advantage of it is that it does not need to copy or enumerate data to do this, just like IEnumerable.Reverse(). The difference is that the view (the result of the operation a["::-1"]) displays the data in reverse order, and you can index to that inverted sequence without enumerating it.
Use case: Reduce complexity by reducing dimensions
When working with high-dimensional data, the algorithms for that data can also become very complex. In use, any high-dimensional volume can be output. When ToString()'s NumSharp method NDArray, I noticed how simple and beautiful the algorithm has become by systematically and recursively cutting ND volumes into (N-1)D volumes, etc. This divide-and-conquer approach returns low-dimensional subvolumes by slicing the range symbols using NumSharp's index symbols. Range symbol vs. index symbolThe range symbol ["start:stop:step"] allows you to access a subrange of a given volume with the same dimension. So even if you cut out only one column of the 2D matrix, you will still get a 2D matrix with only one column. Here's a short piece of C# code that demonstrates this:
Slice the column using the range symbol
The index symbol gives you a (N-1) dimensional slice at the specified location of the N-dimensional parent volume. So cutting out a column from a 2D matrix using index symbols gives you a 1D vector:
Slice columns using index symbols
If you haven't seen the difference at a glance, here are the two slice definitions above side by side, ange[":,2:3"] vs index[":,2"], which have a big impact on the results. A full reference to the new slice symbol can be found on the NumSharp wiki.
Note: <T>ArraySlice
In implementing slicing of N-dimensional views, I concluded that it might be interesting for many other areas in .NET, so I broke it down into my own standalone library called SliceAndDice. It features being a lightweight wrapper for indexing any C# data structure (such as or) ArraySlice<T>, and allows you to use the same remodeling, slicing, and viewing mechanisms without all the other heavy numerical computations. It only takes a few hundred lines of code to achieve excellent slicing capabilities! T[]IList<T>
wraparound
NumSharp has recently been given the same slicing and viewing mechanism, which undoubtedly makes NumPy one of the most important libraries in the Python machine learning ecosystem. SciSharp STACK is an open source organization of a small number of skilled developers who have worked very hard to bring the same functionality to the .NET world. NumSharp's latest improvements are an important cornerstone in achieving this. Original:The hyperlink login is visible.
|