Discussion:
Data, enumerateBytes: separate blocks?
(too old to reply)
Daryle Walker
2017-11-28 06:36:54 UTC
Permalink
Is there a way to make a (NS)Data that uses multiple contiguous blocks? Besides generating a multi-gig file and hoping, that is. I’m using enumerateBytes for efficiency and need to test sequences that cross sub-blocks.

Sent from my iPhone
_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@ml-in.na
Quincey Morris
2017-11-28 06:51:19 UTC
Permalink
Post by Daryle Walker
Is there a way to make a (NS)Data that uses multiple contiguous blocks?
You mean multiple *non-contiguous* blocks??

The only way I know of forcing Data to use multiple blocks is via DispatchData.

(In recent OS versions, Data and DispatchData are sort of the same thing, but I don’t remember where this is described.)

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive
Wim Lewis
2017-11-28 19:32:19 UTC
Permalink
Post by Quincey Morris
Post by Daryle Walker
Is there a way to make a (NS)Data that uses multiple contiguous blocks?
You mean multiple *non-contiguous* blocks??
The only way I know of forcing Data to use multiple blocks is via DispatchData.
(In recent OS versions, Data and DispatchData are sort of the same thing, but I don’t remember where this is described.)
From an ObjC perspective, dispatch_data_t is one of the concrete subclasses of NSData. Swift doesn't know this, though, so you have to jump through some hoops.
Post by Quincey Morris
dispatch_data_t -> NSData bridging
In 64-bit apps using either manual retain/release or ARC, dispatch_data_t can now be freely cast to NSData *, though not vice versa. Note that one implication of this is that NSData objects created by Cocoa may now contain several discontiguous pieces of data. You can efficiently work with discontiguous ranges of data by using the new
- (void) enumerateByteRangesUsingBlock:(void (^)(const void *bytes, NSRange byteRange, BOOL *stop))block
API on NSData. This will be roughly the same speed as -bytes on a contiguous NSData, but avoid allocation and copying for a discontiguous one. Once a discontiguous NSData is compacted to a contiguous one (generally by calling -bytes, other NSData API will do discontiguous accesses), future accesses to the contiguous region will not require additional copying.
Various system APIs (in particular NSFileHandle) have been updated to use discontiguous data for improved performance, so it's best to structure your code to handle it unless it absolutely needs contiguous bytes.
_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive
Quincey Morris
2017-11-28 20:54:27 UTC
Permalink
The only documentation of this is in the 10.9 release notes
That’s what I was thinking of — thanks for clarifying.

I believe that there is one more little piece to this that’s more recent. Since (after that 10.9 change) the NSData class had to become (publicly) aware that subclasses might contain discontiguous data, the opportunity arose for Cocoa to leverage this in other scenarios, where dispatch_data_t (aka DispatchData in Swift) wasn’t involved. That’s good in general, as a performance enhancement for code that cares to enumerate the block ranges, but it happens behind the scenes.

By contrast, AFAIK the only mechanism for 3rd party code to *forcibly* create NSData objects with discontiguous data buffers is via dispatch_data_t/DispatchData. For that reason, it might make more sense for Daryle to work in the DispatchData domain rather than the plain Data domain. However, as you say, there’s a bridge involving some simple hoops available if necessary.


_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to
Daryle Walker
2017-12-19 22:57:42 UTC
Permalink
Post by Quincey Morris
I believe that there is one more little piece to this that’s more recent. Since (after that 10.9 change) the NSData class had to become (publicly) aware that subclasses might contain discontiguous data, the opportunity arose for Cocoa to leverage this in other scenarios, where dispatch_data_t (aka DispatchData in Swift) wasn’t involved. That’s good in general, as a performance enhancement for code that cares to enumerate the block ranges, but it happens behind the scenes.
By contrast, AFAIK the only mechanism for 3rd party code to *forcibly* create NSData objects with discontiguous data buffers is via dispatch_data_t/DispatchData. For that reason, it might make more sense for Daryle to work in the DispatchData domain rather than the plain Data domain. However, as you say, there’s a bridge involving some simple hoops available if necessary.
What are the hoops/bridges required?


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.nark
Quincey Morris
2017-12-20 01:08:54 UTC
Permalink
Post by Daryle Walker
What are the hoops/bridges required?
I think I was referring to what Wim Lewis said, which is that you can create DispatchData values (or perhaps dispatch_data_t objects), but you’re going to have to forcibly cast from dispatch_data_t to its superclass, and then bridge that to Data.

However, if you’re going to the trouble of creating DispatchData values, you may as well use those directly, rather than bridging across to Data. The decision may depend on exactly which APIs you need to use to process the data.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@m
Daryle Walker
2017-12-22 02:04:32 UTC
Permalink
Post by Quincey Morris
Post by Daryle Walker
What are the hoops/bridges required?
I think I was referring to what Wim Lewis said, which is that you can create DispatchData values (or perhaps dispatch_data_t objects), but you’re going to have to forcibly cast from dispatch_data_t to its superclass, and then bridge that to Data.
However, if you’re going to the trouble of creating DispatchData values, you may as well use those directly, rather than bridging across to Data. The decision may depend on exactly which APIs you need to use to process the data.
// Find the first occurance of each potential terminator byte value.
var firstCr, firstLf: Index?
enumerateBytes { buffer, start, stop in
if let localLf = buffer.index(of: ParsingQueue.Constants.lf) {
firstLf = start.advanced(by: buffer.startIndex.distance(to: localLf))
stop = true
}
if let firstCrIndex = firstCr, firstCrIndex.distance(to: start.advanced(by: buffer.count)) > 2 {
// No block after this current one could find a LF close enough to form CR-LF or CR-CR-LF.
stop = true
} else if let localCr = buffer.index(of: ParsingQueue.Constants.cr) {
firstCr = start.advanced(by: buffer.startIndex.distance(to: localCr))
stop = true
}
}
(This is in an extension of Data.) Right now, I have no way to activate the second IF-block for testing.


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkiv
Quincey Morris
2017-12-22 04:10:27 UTC
Permalink
when multiple blocks are used
import Foundation
let buffer1: [UInt8] = [1,2]
let buffer2: [UInt8] = [3,4,5,6]
var data = DispatchData.empty
data.append (buffer1.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
data.append (buffer2.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
print (data.count)
data.enumerateBytes {
bytes, index, stop in
print (bytes.count, index, bytes [0])
}
6
2 0 1
4 2 3
Isn’t this what you were asking for: a loop enumerating each of the contiguous portions of a larger data set?

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent
Daryle Walker
2017-12-22 16:48:47 UTC
Permalink
Post by Quincey Morris
when multiple blocks are used
import Foundation
let buffer1: [UInt8] = [1,2]
let buffer2: [UInt8] = [3,4,5,6]
var data = DispatchData.empty
data.append (buffer1.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
data.append (buffer2.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
print (data.count)
data.enumerateBytes {
bytes, index, stop in
print (bytes.count, index, bytes [0])
}
6
2 0 1
4 2 3
Isn’t this what you were asking for: a loop enumerating each of the contiguous portions of a larger data set?
I already have the code as an extension of Data. In C, we can use pointer type-punning shenanigans to convert between a dispatch_data_t and NSData*. To trigger this code in a test that I would write in Swift, DispatchData would need to be convertible to Data. Is there a way to do the conversion in Swift? It doesn’t seem obvious since DispatchData and Data are value types, not pointer nor reference types.

I just checked it out with your code above in a playground. I added an extension to Data called “printHello” and called it with a Data object I also added. Sure enough, the DispatchData object could not call that same method.


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in
Quincey Morris
2017-12-22 19:18:13 UTC
Permalink
Post by Daryle Walker
DispatchData would need to be convertible to Data. Is there a way to do the conversion in Swift?
Actually, on consideration, I think not. It would be if DispatchData was bridgeable like Data, but it isn’t, and I don’t see any way of extracting its underlying reference. This leaves you with two options that I can see:

1. Use an Obj-C helper function, taking an array of input buffers, and returning a dispatch_data_t object that combines them, cast to a NSData*. You can then use the returned reference as Data.

2. Move your Data extension to DispatchData. That’s what I was asking about earlier — is there any reason why you couldn’t just use DispatchData rather than Data, in all the code that deals with this data set? In that case, you can just build the DispatchData in Swift.

IAC, you should probably submit a bug report. Since dispatch_data_t is documented to be a subclass of NSData, there should probably be a mechanism for getting Data and DispatchData values as equivalents of each other, without any unprovoked copying of the underlying data.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in
Daryle Walker
2017-12-24 07:03:51 UTC
Permalink
Post by Quincey Morris
Post by Daryle Walker
DispatchData would need to be convertible to Data. Is there a way to do the conversion in Swift?
1. Use an Obj-C helper function, taking an array of input buffers, and returning a dispatch_data_t object that combines them, cast to a NSData*. You can then use the returned reference as Data.
2. Move your Data extension to DispatchData. That’s what I was asking about earlier — is there any reason why you couldn’t just use DispatchData rather than Data, in all the code that deals with this data set? In that case, you can just build the DispatchData in Swift.
IAC, you should probably submit a bug report. Since dispatch_data_t is documented to be a subclass of NSData, there should probably be a mechanism for getting Data and DispatchData values as equivalents of each other, without any unprovoked copying of the underlying data.
This code is not for private use within an app, but for something I plan to publicize as a library on GitHub. So the interface has to stay as using Data. (Fortunately, this part of the interface, an extension to Data, is private.) DispatchData and Data don’t even have a custom shared interface (just the general RandomAccessCollection) I could use here to not repeat myself in implementation.

The library doesn’t need DispatchData, just my test case (and only as posing as a Data instance). It doesn’t seem I can do it without heroic measures by making a mixed Swift/Objective-C test project. And even if that is possible, I don’t even know if the Swift Package Manager supports it. So I have to let my meticulous side get bothered by the red-0-calls under Code Coverage mocking me until this is fixed somehow.

Bug #36204480 (“DispatchData IS-NOT-A Data in Swift”).


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to geg
Quincey Morris
2017-12-24 08:14:48 UTC
Permalink
Post by Daryle Walker
The library doesn’t need DispatchData, just my test case (and only as posing as a Data instance).
You *could* try moving the logic that’s inside your current enumerateBytes closure to a new method, and just call the method from the closure. Then, write another method in the test target, similar to the one that uses enumerateBytes, but manually breaks the original Data object into smaller ones (based on, say, an array of start indexes that a test can pass in), and feed those sequentially to the new method with the moved logic.

That would test your boundary-crossing code, which seems to be the point here. (It would actually test it better, because you could automate testing of lots of boundary positions.)
Post by Daryle Walker
It doesn’t seem I can do it without heroic measures by making a mixed Swift/Objective-C test project. And even if that is possible, I don’t even know if the Swift Package Manager supports it.
Or, do this anyway. A mixed project is not "heroic measures”, it’s likely one file with 5 lines of Obj-C code. And surely you can test whether SwiftPM supports the mix in about 5 seconds, just with placeholder code in a single .m file.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email s
Charles Srstka
2017-12-24 12:45:48 UTC
Permalink
Post by Daryle Walker
Post by Quincey Morris
Post by Daryle Walker
DispatchData would need to be convertible to Data. Is there a way to do the conversion in Swift?
1. Use an Obj-C helper function, taking an array of input buffers, and returning a dispatch_data_t object that combines them, cast to a NSData*. You can then use the returned reference as Data.
2. Move your Data extension to DispatchData. That’s what I was asking about earlier — is there any reason why you couldn’t just use DispatchData rather than Data, in all the code that deals with this data set? In that case, you can just build the DispatchData in Swift.
IAC, you should probably submit a bug report. Since dispatch_data_t is documented to be a subclass of NSData, there should probably be a mechanism for getting Data and DispatchData values as equivalents of each other, without any unprovoked copying of the underlying data.
This code is not for private use within an app, but for something I plan to publicize as a library on GitHub. So the interface has to stay as using Data. (Fortunately, this part of the interface, an extension to Data, is private.) DispatchData and Data don’t even have a custom shared interface (just the general RandomAccessCollection) I could use here to not repeat myself in implementation.
Depending on what your library does, you could consider making its interface take generic collections of UInt8. Then, your APIs would accept Data, DispatchData, [UInt8], ContiguousArray<UInt8>, etc.

Charles

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@m
Quincey Morris
2017-12-24 20:51:12 UTC
Permalink
Post by Charles Srstka
you could consider making its interface take generic collections of UInt8
This would not solve the *general* problem Daryle raised. He’s looking for a way to test the logic of some buffer-boundary-crossing code, which makes sense only if he has multiple buffers, which means he must be using “enumerateBytes”, which not supported by Collection<UInt8>. If he doesn’t use enumerateBytes, then he doesn’t need anything but Data anyway.

However, considering what appears to be the *actual* problem (finding the first CR or CR-LF or CR-CR-LF separator in a byte sequence), he could use Data without using enumerateBytes, and still not risk copying the data to a contiguous buffer.

This solution would use Data’s “index(of:)” to find the first CR, then a combination of advancing the index and subscripting to test for LF in the following 1 or 2 positions.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This e
Charles Srstka
2017-12-25 00:18:04 UTC
Permalink
Post by Quincey Morris
Post by Charles Srstka
you could consider making its interface take generic collections of UInt8
This would not solve the *general* problem Daryle raised. He’s looking for a way to test the logic of some buffer-boundary-crossing code, which makes sense only if he has multiple buffers, which means he must be using “enumerateBytes”, which not supported by Collection<UInt8>. If he doesn’t use enumerateBytes, then he doesn’t need anything but Data anyway.
However, considering what appears to be the *actual* problem (finding the first CR or CR-LF or CR-CR-LF separator in a byte sequence), he could use Data without using enumerateBytes, and still not risk copying the data to a contiguous buffer.
This solution would use Data’s “index(of:)” to find the first CR, then a combination of advancing the index and subscripting to test for LF in the following 1 or 2 positions.
That’s basically what I was thinking (well, using Collection<UInt8>’s index(of:) rather than just Data’s). However, if enumerateBytes legitimately does need to be used, you could also create a protocol containing that method and make Data and DispatchData both retroactively conform to it.

Charles

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email s
Daryle Walker
2017-12-25 18:23:19 UTC
Permalink
Post by Quincey Morris
Post by Charles Srstka
you could consider making its interface take generic collections of UInt8
This would not solve the *general* problem Daryle raised. He’s looking for a way to test the logic of some buffer-boundary-crossing code, which makes sense only if he has multiple buffers, which means he must be using “enumerateBytes”, which not supported by Collection<UInt8>. If he doesn’t use enumerateBytes, then he doesn’t need anything but Data anyway.
However, considering what appears to be the *actual* problem (finding the first CR or CR-LF or CR-CR-LF separator in a byte sequence), he could use Data without using enumerateBytes, and still not risk copying the data to a contiguous buffer.
This solution would use Data’s “index(of:)” to find the first CR, then a combination of advancing the index and subscripting to test for LF in the following 1 or 2 positions.
Not quite.

My first versions of this idea, pre-Swift and therefore using NSData with Objective-C, did use the direct search functions that come with the NSData API. There seems to be a detail you missed in my sample code that explains the use of “enumerateBytes”:

LF-only is also a searched-for separator.

That means no matter what, I must find the first CR and the first LF. Then I compare their relative positions (and check for another CR if the spacing is right). What happens if whichever byte value is second is gigabytes away from the first? (Or equivalently, only one value is present and there’s gigabytes of trailing data to fail to find the other value.) I would end up wasting the user’s time for a second result I’d never use.


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.
Charles Srstka
2017-12-25 18:44:17 UTC
Permalink
Post by Daryle Walker
Not quite.
LF-only is also a searched-for separator.
That means no matter what, I must find the first CR and the first LF. Then I compare their relative positions (and check for another CR if the spacing is right). What happens if whichever byte value is second is gigabytes away from the first? (Or equivalently, only one value is present and there’s gigabytes of trailing data to fail to find the other value.) I would end up wasting the user’s time for a second result I’d never use.
With either Collection<UInt8> or Data, the value that index(of:) returns for the second value will be one greater than what it returns for the first value in that case, regardless of how the data is stored under the hood.

Charles

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net
Quincey Morris
2017-12-25 19:09:21 UTC
Permalink
Post by Daryle Walker
What happens if whichever byte value is second is gigabytes away from the first?
var firstCr, firstLf: Index?
enumerateBytes { buffer, start, stop in
if let localLf = buffer.index(of: ParsingQueue.Constants.lf) {
firstLf = start.advanced(by: buffer.startIndex.distance(to: localLf))
stop = true
}
if let firstCrIndex = firstCr, firstCrIndex.distance(to: start.advanced(by: buffer.count)) > 2 {
// No block after this current one could find a LF close enough to form CR-LF or CR-CR-LF.
stop = true
} else if let localCr = buffer.index(of: ParsingQueue.Constants.cr) {
firstCr = start.advanced(by: buffer.startIndex.distance(to: localCr))
stop = true
}
}
In the case where the Data object is *one* multi-GB buffer, if it doesn’t contain a LF you will search gigabytes for the non-existent LF before searching them again for the CR. Even if you’re lucky and the Data object is multiple smallish-buffers, you will still search all the buffers that don’t have a CR for a LF, before you find the one that does have a CR.

So, if your goal is to minimize searching, you have to search for CR and LF simultaneously. There are two easy ways to do this:

1. Use “index(where:)” and test for both values in the closure.

2. Use a manual loop that indexes into a buffer pointer (C-style).

#1 is the obvious choice unless invoking the closure is too slow when a lot of bytes need to be examined. #2 would use “enumerateBytes” to get a series of buffer pointers efficiently, but there is no boundary code to be tested, since you’re only examining 1 byte at a time.

Once you have the optional indices to the first CR or LF, and you find you need to check for a potential CR-LF or CR-CR-LF, you can do that by subscripting into the original Data object directly, outside of the search loop.

This approach would eliminate the problematic test case, and (unless I’m missing something obvious) have the initial search as its only O(n) computation, everything else being O(1), i.e. constant and trivial.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@ml-in
Daryle Walker
2017-12-27 19:59:52 UTC
Permalink
Post by Quincey Morris
1. Use “index(where:)” and test for both values in the closure.
2. Use a manual loop that indexes into a buffer pointer (C-style).
#1 is the obvious choice unless invoking the closure is too slow when a lot of bytes need to be examined. #2 would use “enumerateBytes” to get a series of buffer pointers efficiently, but there is no boundary code to be tested, since you’re only examining 1 byte at a time.
Once you have the optional indices to the first CR or LF, and you find you need to check for a potential CR-LF or CR-CR-LF, you can do that by subscripting into the original Data object directly, outside of the search loop.
This approach would eliminate the problematic test case, and (unless I’m missing something obvious) have the initial search as its only O(n) computation, everything else being O(1), i.e. constant and trivial.
guard let firstBreak = index(where: {
[MyConstants.cr, MyConstants.lf].contains($0)
}) else { return nil }
let which: Terminator
switch self[firstBreak] {
let nextBreak = index(after: firstBreak)
if nextBreak < endIndex {
switch self[nextBreak] {
let nextBreak2 = index(after: nextBreak)
if nextBreak2 < endIndex {
if self[nextBreak2] == MyConstants.lf {
which = .crcrlf
} else {
which = .cr
}
} else {
which = .cr
}
which = .crlf
which = .cr
}
} else {
which = .cr
}
which = .lf
preconditionFailure("The search from 'index' should never find anything outside {CR, LF}.")
}
return (which, firstBreak)
In my basic test suite, the property is called 37 times. The guard’s return is hit 4 times, and the outer switch 33 times. For that outer switch, the CR case is hit 25 times, the LF case 8 times, and that default I had to put in 0 times. Within the CR case, the individual results are hit 4, 6, 3, 5, 5, and 2 times respectively.

However, the guard’s contain test is covered 192 times! I’m guessing that’s once for each byte the code goes past, right? Between that and wondering how efficient the test is, I wonder if using something like [2] would be better. But I would test a megabyte at a time or something. Now I have to figure out how to divide a range to a set of subranges (of a set size, except possibly the last). And how would I test which way is faster?


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent t
Quincey Morris
2017-12-27 21:50:34 UTC
Permalink
Post by Quincey Morris
Post by Quincey Morris
guard let firstBreak = index(where: {
[MyConstants.cr, MyConstants.lf].contains($0)
}) else { return nil }
guard let firstBreak = index(where: {
$0 == MyConstants.cr || MyConstants.lf == $0
}) else { return nil }
It *should* be possible for the compiler to optimize your version to this, at least in a release build, but after a certain amount of complexity in the source, it might be asking too much.

Since your test result showed “contains” actually being invoked, you know you have that overhead, and also I wonder if there’s some overhead in setting up the array value for every byte.
Post by Quincey Morris
I wonder if using something like [2]
*My* [2]? I dunno exactly, the difference is that it eliminates the invocation of “index(where:)” for each byte. It’s not clear whether the current (or the upcoming) Swift compiler could actually inline the method, which would make it a wash, or if the overhead is worth worrying about.

My advice is, as usual, don’t go to extraordinary lengths to optimize something unless you know you have a performance problem, and breaking out of a pure Swift idiom might be a little extraordinary.

_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@ml-in.narkiv
Wim Lewis
2018-01-09 02:40:42 UTC
Permalink
Post by Daryle Walker
I already have the code as an extension of Data. In C, we can use pointer type-punning shenanigans to convert between a dispatch_data_t and NSData*. To trigger this code in a test that I would write in Swift, DispatchData would need to be convertible to Data. Is there a way to do the conversion in Swift? It doesn’t seem obvious since DispatchData and Data are value types, not pointer nor reference types.
let buffer1: [UInt8] = [1,2]
let buffer2: [UInt8] = [3,4,5,6]
var data = DispatchData.empty
data.append (buffer1.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
data.append (buffer2.withUnsafeBytes { DispatchData (bytesNoCopy: $0) })
let bdata = ((data as Any) as! NSData) as Data
bdata.enumerateBytes { (bufp, pos, stop) in
print("buffer at index", pos, "is", bufp)
}
Casting the DispatchData to Any causes Swift to lose the static type information, like casting it to (id) in objc. Downcasting that Any to NSData causes it to check that the dynamic type is compatible (a small unnecessary run-time cost, presumably equivalent to [data isKindOfClass:[NSData class]]) and to know that the value is statically an NSData. Then cast it to the Swift Data type (which should be free).

You don't actually need all three casts, you can get by with two, but I think that explicitly casting through NSData makes it clearer to the reader why you're doing this dance, and I don't think it causes any more code to be generated in the actual executable.

If Apple at some point in the future makes DispatchData no longer toll-free-bridgable to NSData, this will fail at runtime (the as! will throw). Of course it would be preferable for it to fail at compile-time, but until the DispatchData type is correctly annotated, this will have to do.


_______________________________________________

Cocoa-dev mailing list (Cocoa-***@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/gegs%40ml-in.narkive.net

This email sent to ***@ml-in.narkive.net

Loading...